Hermes Agent

Interaction Sequences — v0.14.0 — Nous Research

How messages, tools, and decisions flow through time. These sequence diagrams trace the lifecycle of every interaction — from a user typing in the CLI, through LLM inference and tool dispatch, to error recovery and subagent delegation.

12
Key Sequences
6
Entry Points
3
Nested Loops
4
Compression Phases
8
Recovery Strategies
1
Delegation Depth
Overview

Reading these diagrams

Each diagram shows participants as vertical lifelines. Arrows represent messages or calls between components. Colored boxes (activate/deactivate) show when a component is actively processing. Fragments like alt opt loop break show conditional and iterative behavior. Notes provide context on key decisions.

Zoom & Pan: Ctrl/Cmd + wheel to zoom. Scroll or drag to pan. Double-click to fit. Use the controls in the top-right corner of each diagram.
Sequence 1

CLI Message Flow

What happens when you type a message in the terminal. The CLI reads input from a pending queue, determines if it is a slash command or chat message, then routes through credential checks, model resolution, and agent execution. The happy path ends with an LLM response displayed in a formatted box.

Participants
  • User — Human at the terminal
  • HermesCLI — CLI process loop
  • AIAgent — Core agent engine
  • LLMProvider — Remote inference API
  • ToolRegistry — Tool dispatch system
  • Tool — Individual tool handler
Key Steps
  • Read from _pending_input queue
  • Slash commands dispatch via resolve_command()
  • Chat: ensure credentials, resolve model config
  • Init AIAgent if needed, spawn agent thread
  • Monitor interrupt queue during execution
  • Display response box, update history

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2

Gateway Message Flow

Platform messages from Telegram, Discord, Slack, and other adapters flow through the GatewayRunner. The runner handles authorization, session management, transcript persistence, and context compression — enabling stateful multi-turn conversations across platforms.

Participants
  • Platform — External messaging service
  • Adapter — Platform-specific adapter
  • GatewayRunner — Central message handler
  • SessionStore — Persistent session state
  • AIAgent — Core agent engine
  • LLMProvider — Remote inference API
Session Lifecycle
  • Create or resume session from session_store
  • Load conversation history from transcript
  • Hygiene-compress if context exceeds 85% (safety net; the agent's own compressor triggers earlier at ~50%, see Compression sequence)
  • Register agent in _running_agents map
  • Execute and send response via adapter
  • Persist transcript, clean up agent entry

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2b

TUI Message Flow

The TUI gateway supports two transports: stdio (the Ink terminal UI spawns tui_gateway as a child process) and WebSocket (callers connect to tui_gateway/ws.py via the dashboard's /api/ws sidecar). Both use the same JSON-RPC wire protocol; the transport layer (tui_gateway/transport.py) abstracts the I/O so the dispatcher is shared. The dashboard's primary chat input/output runs over a separate /api/pty PTY-WS bridge (see Web Dashboard Flow below); the JSON-RPC sidecar carries structured metadata (tool-call sidebar, slash launcher, model badge) bound to the same session.

Participants
  • User — Human at the terminal
  • TUI (Ink/React) — TypeScript terminal UI
  • GatewayClient — JSON-RPC bridge in Node.js
  • TUIGateway — Python JSON-RPC server (child process)
  • AIAgent — Core agent engine
  • LLMProvider — Remote inference API
Key Steps
  • TUI spawns tui_gateway as stdio child process
  • Web dashboard connects via WebSocket to /api/ws
  • User input serialized as JSON-RPC (same wire format on both transports)
  • Gateway creates AIAgent, calls run_conversation()
  • Streaming tokens sent back as JSON-RPC notifications
  • Slash commands routed to persistent SlashWorker subprocess
  • TUI/browser renders markdown, thinking blocks, and tool activity

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2c

Web Dashboard Flow

The web dashboard is a React 19 + Vite SPA served by a FastAPI backend. Its chat page reuses the TUI gateway over WebSocket — the browser connects to the same JSON-RPC server that drives the terminal TUI. REST endpoints handle config, sessions, and analytics.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2d

ACP Editor Flow

The ACP (Agent Client Protocol) server lets editors like VS Code, Cursor, and Zed drive Hermes as a coding agent. The editor manages sessions, sends prompts, receives streaming events (text, thinking, tool progress), and handles approval requests for dangerous commands — all over stdio JSON-RPC.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2e

MCP Server Flow

The MCP server exposes gateway conversations as tools that any MCP client (Claude Code, Cursor, Codex) can consume. It reads from SessionDB and the gateway sessions index — it doesn’t run an agent itself. An EventBridge polls for new messages and streams updates to connected clients.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 2f

Cron Job Execution

The gateway daemon runs a 60-second tick in a background thread (cron/scheduler.py). Each tick acquires a non-blocking file lock so only one tick runs at a time across overlapping processes, scans ~/.hermes/cron/jobs.json for due jobs, advances next_run_at under the lock (at-most-once semantics), then dispatches each job to a worker thread that builds an isolated AIAgent and delivers the response to the configured target.

Participants
  • Ticker — Gateway background thread (60s loop)
  • Scheduler — tick() in cron/scheduler.py
  • jobs.json — persistent job definitions
  • Lock — ~/.hermes/cron/.tick.lock (fcntl/msvcrt)
  • Worker — ThreadPoolExecutor worker per job
  • AIAgent — fresh isolated session
  • Delivery — live platform adapter or local sink
Key Steps
  • Non-blocking flock — skip tick if another holds the lock
  • Load due jobs from jobs.json
  • Advance next_run_at first, under the lock — at-most-once
  • Parallel dispatch (HERMES_CRON_MAX_PARALLEL, unbounded default)
  • Per job: build prompt with skills + prompt-injection scan
  • Run isolated AIAgent in a fresh session (no shared state with user sessions)
  • Deliver final response via gateway adapter or local; [SILENT] marker suppresses delivery

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 3

The Agent Loop

Inside run_conversation() lives the core agent loop. It manages system prompt construction, preflight compression, the main iteration loop with LLM calls, tool execution, and error recovery. The nested retry loop handles transient failures with credential rotation, auth refresh, and fallback activation.

Setup Phase
  • Restore primary runtime if fallback was active
  • Build or restore system prompt
  • Preflight compression up to 3 passes
  • Fire pre_llm_call plugin hook
Main Loop
  • Check interrupt, consume budget
  • Prepare messages with memory and caching
  • Retry loop with max 3 attempts
  • Tool calls: validate then execute
Error Recovery
  • Credential rotation on auth errors
  • Context compression on size errors
  • Fallback chain activation on failure
  • Save trajectory, persist session

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 4

Context Compression

When context grows beyond the threshold (default 50% of the context window), the compression algorithm runs in four distinct phases. It prunes old tool results, determines safe boundaries, generates an LLM-powered summary, and assembles the compressed history with proper tool-call pair sanitization.

Trigger Points
  • Preflight: before the main agent loop starts
  • In-loop: after tool calls if context grows
  • Error recovery: on 413 or context-length errors
Failure Mode
  • If summary generation fails, drops middle without summary
  • Graceful degradation preserves head and tail
  • Orphaned tool-call pairs sanitized automatically

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 5

Tool Execution

Tool dispatch follows a two-tier architecture. Agent-level tools (todo, memory, session_search, clarify, delegate) are handled directly inside the agent. Everything else routes through the ToolRegistry. Batch tool calls can execute in parallel via ThreadPoolExecutor when the parallelization check passes.

Agent-Level Tools
  • todo — Task tracking
  • memory — Persistent memory ops
  • session_search — History search
  • clarify — Ask user for clarification
  • delegate — Spawn subagent
Registry Tools
  • Dispatched via registry.dispatch()
  • Args coerced via coerce_tool_args()
  • Plugin hooks: pre_tool_call, post_tool_call
  • ToolEntry looked up by name, handler invoked
  • Includes file, terminal, browser, search, etc.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 6

Subagent Delegation

The delegate tool spawns isolated child agents for focused subtasks. Children inherit a restricted toolset (no delegate, clarify, memory, send_message, or execute_code), run in quiet mode, and are limited to depth 1 by default (configurable up to 3 via delegation.max_spawn_depth). Up to 3 tasks execute concurrently via ThreadPoolExecutor.

Constraints
  • Max delegation depth: 1 by default (configurable up to 3)
  • Max concurrent tasks: 3
  • Blocked tools: delegate, clarify, memory, send_message, execute_code
  • Child runs with quiet_mode=True
  • Child skips context files and memory loading
Child Configuration
  • Toolset: intersection of parent tools minus blocked
  • skip_context_files=True
  • skip_memory=True
  • _delegate_depth = parent + 1
  • Registered in parent _active_children for interrupt propagation

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Sequence 7

Error Recovery and Fallback

When API calls fail, the agent cascades through eight recovery strategies before giving up. The fallback chain walks through configured alternative models and providers, swapping the client in-place. The primary runtime is automatically restored at the start of the next conversation turn.

Recovery Cascade
  • 1. Surrogate sanitization (once)
  • 2. Credential pool rotation
  • 3. Provider-specific auth refresh
  • 4. Context-length error: step down and compress
  • 5. 413 payload too large: compress and retry
  • 6. Rate limit 429: eager fallback
  • 7. Non-retryable 4xx: try fallback then abort
  • 8. Max retries: primary recovery then fallback then give up
Fallback Mechanics
  • _try_activate_fallback() walks the chain
  • Swaps model, provider, and client in-place
  • Updates compressor with new context limits
  • _restore_primary_runtime() called at next turn
  • Chain configured via fallback_model and fallback_providers

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Plugin Lifecycle Hooks. Plugins fire at specific points in the sequences above. See Data Model § Plugin System for the full 14-hook list and registration mechanism; the two cards below show where each category fires relative to the diagrams on this page.
Agent Loop Hooks
  • pre_llm_call — before each LLM API call
  • post_llm_call — after LLM response received
  • pre_api_request — before HTTP request to provider
  • post_api_request — after HTTP response
  • on_session_start / on_session_end
  • on_session_finalize / on_session_reset
  • on_memory_write — after memory tool writes
  • subagent_stop — when child agent finishes
Tool Execution Hooks
  • pre_tool_call — before each tool handler runs
  • post_tool_call — after tool returns result
  • transform_terminal_output — filter terminal output
  • transform_tool_result — modify any tool result