Hermes Agent
Interaction Sequences — v0.14.0 — Nous Research
How messages, tools, and decisions flow through time. These sequence diagrams trace the lifecycle of every interaction — from a user typing in the CLI, through LLM inference and tool dispatch, to error recovery and subagent delegation.
Reading these diagrams
Each diagram shows participants as vertical lifelines. Arrows represent messages or calls between components. Colored boxes (activate/deactivate) show when a component is actively processing. Fragments like alt opt loop break show conditional and iterative behavior. Notes provide context on key decisions.
CLI Message Flow
What happens when you type a message in the terminal. The CLI reads input from a pending queue, determines if it is a slash command or chat message, then routes through credential checks, model resolution, and agent execution. The happy path ends with an LLM response displayed in a formatted box.
User— Human at the terminalHermesCLI— CLI process loopAIAgent— Core agent engineLLMProvider— Remote inference APIToolRegistry— Tool dispatch systemTool— Individual tool handler
- Read from
_pending_inputqueue - Slash commands dispatch via
resolve_command() - Chat: ensure credentials, resolve model config
- Init AIAgent if needed, spawn agent thread
- Monitor interrupt queue during execution
- Display response box, update history
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Gateway Message Flow
Platform messages from Telegram, Discord, Slack, and other adapters flow through the GatewayRunner. The runner handles authorization, session management, transcript persistence, and context compression — enabling stateful multi-turn conversations across platforms.
Platform— External messaging serviceAdapter— Platform-specific adapterGatewayRunner— Central message handlerSessionStore— Persistent session stateAIAgent— Core agent engineLLMProvider— Remote inference API
- Create or resume session from
session_store - Load conversation history from transcript
- Hygiene-compress if context exceeds 85% (safety net; the agent's own compressor triggers earlier at ~50%, see Compression sequence)
- Register agent in
_running_agentsmap - Execute and send response via adapter
- Persist transcript, clean up agent entry
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
TUI Message Flow
The TUI gateway supports two transports: stdio (the Ink terminal UI spawns
tui_gateway as a child process) and WebSocket
(callers connect to tui_gateway/ws.py via the dashboard's
/api/ws sidecar). Both use the same JSON-RPC wire protocol; the transport layer
(tui_gateway/transport.py) abstracts the I/O so the dispatcher
is shared. The dashboard's primary chat input/output runs over a separate
/api/pty PTY-WS bridge (see Web Dashboard Flow below); the JSON-RPC
sidecar carries structured metadata (tool-call sidebar, slash launcher, model badge)
bound to the same session.
User— Human at the terminalTUI (Ink/React)— TypeScript terminal UIGatewayClient— JSON-RPC bridge in Node.jsTUIGateway— Python JSON-RPC server (child process)AIAgent— Core agent engineLLMProvider— Remote inference API
- TUI spawns
tui_gatewayas stdio child process - Web dashboard connects via WebSocket to
/api/ws - User input serialized as JSON-RPC (same wire format on both transports)
- Gateway creates AIAgent, calls
run_conversation() - Streaming tokens sent back as JSON-RPC notifications
- Slash commands routed to persistent
SlashWorkersubprocess - TUI/browser renders markdown, thinking blocks, and tool activity
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Web Dashboard Flow
The web dashboard is a React 19 + Vite SPA served by a FastAPI backend. Its chat page reuses the TUI gateway over WebSocket — the browser connects to the same JSON-RPC server that drives the terminal TUI. REST endpoints handle config, sessions, and analytics.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
ACP Editor Flow
The ACP (Agent Client Protocol) server lets editors like VS Code, Cursor, and Zed drive Hermes as a coding agent. The editor manages sessions, sends prompts, receives streaming events (text, thinking, tool progress), and handles approval requests for dangerous commands — all over stdio JSON-RPC.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
MCP Server Flow
The MCP server exposes gateway conversations as tools that any MCP client (Claude Code, Cursor, Codex) can consume. It reads from SessionDB and the gateway sessions index — it doesn’t run an agent itself. An EventBridge polls for new messages and streams updates to connected clients.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Cron Job Execution
The gateway daemon runs a 60-second tick in a background thread
(cron/scheduler.py).
Each tick acquires a non-blocking file lock so only one tick runs at a time
across overlapping processes, scans ~/.hermes/cron/jobs.json for due jobs,
advances next_run_at under the lock (at-most-once semantics),
then dispatches each job to a worker thread that builds an isolated AIAgent
and delivers the response to the configured target.
Ticker— Gateway background thread (60s loop)Scheduler— tick() in cron/scheduler.pyjobs.json— persistent job definitionsLock— ~/.hermes/cron/.tick.lock (fcntl/msvcrt)Worker— ThreadPoolExecutor worker per jobAIAgent— fresh isolated sessionDelivery— live platform adapter or local sink
- Non-blocking flock — skip tick if another holds the lock
- Load due jobs from
jobs.json - Advance
next_run_atfirst, under the lock — at-most-once - Parallel dispatch (
HERMES_CRON_MAX_PARALLEL, unbounded default) - Per job: build prompt with skills + prompt-injection scan
- Run isolated AIAgent in a fresh session (no shared state with user sessions)
- Deliver final response via gateway adapter or local;
[SILENT]marker suppresses delivery
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
The Agent Loop
Inside run_conversation() lives the core agent loop. It manages system prompt
construction, preflight compression, the main iteration loop with LLM calls, tool
execution, and error recovery. The nested retry loop handles transient failures with
credential rotation, auth refresh, and fallback activation.
- Restore primary runtime if fallback was active
- Build or restore system prompt
- Preflight compression up to 3 passes
- Fire
pre_llm_callplugin hook
- Check interrupt, consume budget
- Prepare messages with memory and caching
- Retry loop with max 3 attempts
- Tool calls: validate then execute
- Credential rotation on auth errors
- Context compression on size errors
- Fallback chain activation on failure
- Save trajectory, persist session
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Context Compression
When context grows beyond the threshold (default 50% of the context window), the compression algorithm runs in four distinct phases. It prunes old tool results, determines safe boundaries, generates an LLM-powered summary, and assembles the compressed history with proper tool-call pair sanitization.
- Preflight: before the main agent loop starts
- In-loop: after tool calls if context grows
- Error recovery: on 413 or context-length errors
- If summary generation fails, drops middle without summary
- Graceful degradation preserves head and tail
- Orphaned tool-call pairs sanitized automatically
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Tool Execution
Tool dispatch follows a two-tier architecture. Agent-level tools (todo, memory, session_search, clarify, delegate) are handled directly inside the agent. Everything else routes through the ToolRegistry. Batch tool calls can execute in parallel via ThreadPoolExecutor when the parallelization check passes.
todo— Task trackingmemory— Persistent memory opssession_search— History searchclarify— Ask user for clarificationdelegate— Spawn subagent
- Dispatched via
registry.dispatch() - Args coerced via
coerce_tool_args() - Plugin hooks:
pre_tool_call,post_tool_call - ToolEntry looked up by name, handler invoked
- Includes file, terminal, browser, search, etc.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Subagent Delegation
The delegate tool spawns isolated child agents for focused subtasks. Children inherit
a restricted toolset (no delegate, clarify, memory, send_message, or execute_code),
run in quiet mode, and are limited to depth 1 by default (configurable up to 3 via
delegation.max_spawn_depth). Up to 3 tasks execute concurrently via
ThreadPoolExecutor.
- Max delegation depth: 1 by default (configurable up to 3)
- Max concurrent tasks: 3
- Blocked tools: delegate, clarify, memory, send_message, execute_code
- Child runs with
quiet_mode=True - Child skips context files and memory loading
- Toolset: intersection of parent tools minus blocked
skip_context_files=Trueskip_memory=True_delegate_depth = parent + 1- Registered in parent
_active_childrenfor interrupt propagation
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Error Recovery and Fallback
When API calls fail, the agent cascades through eight recovery strategies before giving up. The fallback chain walks through configured alternative models and providers, swapping the client in-place. The primary runtime is automatically restored at the start of the next conversation turn.
- 1. Surrogate sanitization (once)
- 2. Credential pool rotation
- 3. Provider-specific auth refresh
- 4. Context-length error: step down and compress
- 5. 413 payload too large: compress and retry
- 6. Rate limit 429: eager fallback
- 7. Non-retryable 4xx: try fallback then abort
- 8. Max retries: primary recovery then fallback then give up
_try_activate_fallback()walks the chain- Swaps model, provider, and client in-place
- Updates compressor with new context limits
_restore_primary_runtime()called at next turn- Chain configured via
fallback_modelandfallback_providers
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
pre_llm_call— before each LLM API callpost_llm_call— after LLM response receivedpre_api_request— before HTTP request to providerpost_api_request— after HTTP responseon_session_start/on_session_endon_session_finalize/on_session_reseton_memory_write— after memory tool writessubagent_stop— when child agent finishes
pre_tool_call— before each tool handler runspost_tool_call— after tool returns resulttransform_terminal_output— filter terminal outputtransform_tool_result— modify any tool result