Hermes Agent

Interaction Sequences — v0.14.0 — Nous Research

How messages, tools, and decisions flow through time. These sequence diagrams trace the lifecycle of every interaction — from a user typing in the CLI, through LLM inference and tool dispatch, to error recovery and subagent delegation.

Key Sequences

Entry Points

Nested Loops

Compression Phases

Recovery Strategies

Delegation Depth

Overview

Reading these diagrams

Each diagram shows participants as vertical lifelines. Arrows represent messages or calls between components. Colored boxes (activate/deactivate) show when a component is actively processing. Fragments like alt opt loop break show conditional and iterative behavior. Notes provide context on key decisions.

Zoom & Pan: Ctrl/Cmd + wheel to zoom. Scroll or drag to pan. Double-click to fit. Use the controls in the top-right corner of each diagram.

Sequence 1

CLI Message Flow

What happens when you type a message in the terminal. The CLI reads input from a pending queue, determines if it is a slash command or chat message, then routes through credential checks, model resolution, and agent execution. The happy path ends with an LLM response displayed in a formatted box.

Participants

User — Human at the terminal
HermesCLI — CLI process loop
AIAgent — Core agent engine
LLMProvider — Remote inference API
ToolRegistry — Tool dispatch system
Tool — Individual tool handler

Key Steps

Read from _pending_input queue
Slash commands dispatch via resolve_command()
Chat: ensure credentials, resolve model config
Init AIAgent if needed, spawn agent thread
Monitor interrupt queue during execution
Display response box, update history

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2

Gateway Message Flow

Platform messages from Telegram, Discord, Slack, and other adapters flow through the GatewayRunner. The runner handles authorization, session management, transcript persistence, and context compression — enabling stateful multi-turn conversations across platforms.

Participants

Platform — External messaging service
Adapter — Platform-specific adapter
GatewayRunner — Central message handler
SessionStore — Persistent session state
AIAgent — Core agent engine
LLMProvider — Remote inference API

Session Lifecycle

Create or resume session from session_store
Load conversation history from transcript
Hygiene-compress if context exceeds 85% (safety net; the agent's own compressor triggers earlier at ~50%, see Compression sequence)
Register agent in _running_agents map
Execute and send response via adapter
Persist transcript, clean up agent entry

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2b

TUI Message Flow

The TUI gateway supports two transports: stdio (the Ink terminal UI spawns tui_gateway as a child process) and WebSocket (callers connect to tui_gateway/ws.py via the dashboard's /api/ws sidecar). Both use the same JSON-RPC wire protocol; the transport layer (tui_gateway/transport.py) abstracts the I/O so the dispatcher is shared. The dashboard's primary chat input/output runs over a separate /api/pty PTY-WS bridge (see Web Dashboard Flow below); the JSON-RPC sidecar carries structured metadata (tool-call sidebar, slash launcher, model badge) bound to the same session.

Participants

User — Human at the terminal
TUI (Ink/React) — TypeScript terminal UI
GatewayClient — JSON-RPC bridge in Node.js
TUIGateway — Python JSON-RPC server (child process)
AIAgent — Core agent engine
LLMProvider — Remote inference API

Key Steps

TUI spawns tui_gateway as stdio child process
Web dashboard connects via WebSocket to /api/ws
User input serialized as JSON-RPC (same wire format on both transports)
Gateway creates AIAgent, calls run_conversation()
Streaming tokens sent back as JSON-RPC notifications
Slash commands routed to persistent SlashWorker subprocess
TUI/browser renders markdown, thinking blocks, and tool activity

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2c

Web Dashboard Flow

The web dashboard is a React 19 + Vite SPA served by a FastAPI backend. Its chat page reuses the TUI gateway over WebSocket — the browser connects to the same JSON-RPC server that drives the terminal TUI. REST endpoints handle config, sessions, and analytics.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2d

ACP Editor Flow

The ACP (Agent Client Protocol) server lets editors like VS Code, Cursor, and Zed drive Hermes as a coding agent. The editor manages sessions, sends prompts, receives streaming events (text, thinking, tool progress), and handles approval requests for dangerous commands — all over stdio JSON-RPC.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2e

MCP Server Flow

The MCP server exposes gateway conversations as tools that any MCP client (Claude Code, Cursor, Codex) can consume. It reads from SessionDB and the gateway sessions index — it doesn’t run an agent itself. An EventBridge polls for new messages and streams updates to connected clients.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 2f

Cron Job Execution

The gateway daemon runs a 60-second tick in a background thread (cron/scheduler.py). Each tick acquires a non-blocking file lock so only one tick runs at a time across overlapping processes, scans ~/.hermes/cron/jobs.json for due jobs, advances next_run_at under the lock (at-most-once semantics), then dispatches each job to a worker thread that builds an isolated AIAgent and delivers the response to the configured target.

Participants

Ticker — Gateway background thread (60s loop)
Scheduler — tick() in cron/scheduler.py
jobs.json — persistent job definitions
Lock — ~/.hermes/cron/.tick.lock (fcntl/msvcrt)
Worker — ThreadPoolExecutor worker per job
AIAgent — fresh isolated session
Delivery — live platform adapter or local sink

Key Steps

Non-blocking flock — skip tick if another holds the lock
Load due jobs from jobs.json
Advance next_run_at first, under the lock — at-most-once
Parallel dispatch (HERMES_CRON_MAX_PARALLEL, unbounded default)
Per job: build prompt with skills + prompt-injection scan
Run isolated AIAgent in a fresh session (no shared state with user sessions)
Deliver final response via gateway adapter or local; [SILENT] marker suppresses delivery

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 3

The Agent Loop

Inside run_conversation() lives the core agent loop. It manages system prompt construction, preflight compression, the main iteration loop with LLM calls, tool execution, and error recovery. The nested retry loop handles transient failures with credential rotation, auth refresh, and fallback activation.

Setup Phase

Restore primary runtime if fallback was active
Build or restore system prompt
Preflight compression up to 3 passes
Fire pre_llm_call plugin hook

Main Loop

Check interrupt, consume budget
Prepare messages with memory and caching
Retry loop with max 3 attempts
Tool calls: validate then execute

Error Recovery

Credential rotation on auth errors
Context compression on size errors
Fallback chain activation on failure
Save trajectory, persist session

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 4

Context Compression

When context grows beyond the threshold (default 50% of the context window), the compression algorithm runs in four distinct phases. It prunes old tool results, determines safe boundaries, generates an LLM-powered summary, and assembles the compressed history with proper tool-call pair sanitization.

Trigger Points

Preflight: before the main agent loop starts
In-loop: after tool calls if context grows
Error recovery: on 413 or context-length errors

Failure Mode

If summary generation fails, drops middle without summary
Graceful degradation preserves head and tail
Orphaned tool-call pairs sanitized automatically

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 5

Tool Execution

Tool dispatch follows a two-tier architecture. Agent-level tools (todo, memory, session_search, clarify, delegate) are handled directly inside the agent. Everything else routes through the ToolRegistry. Batch tool calls can execute in parallel via ThreadPoolExecutor when the parallelization check passes.

Agent-Level Tools

todo — Task tracking
memory — Persistent memory ops
session_search — History search
clarify — Ask user for clarification
delegate — Spawn subagent

Registry Tools

Dispatched via registry.dispatch()
Args coerced via coerce_tool_args()
Plugin hooks: pre_tool_call, post_tool_call
ToolEntry looked up by name, handler invoked
Includes file, terminal, browser, search, etc.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 6

Subagent Delegation

The delegate tool spawns isolated child agents for focused subtasks. Children inherit a restricted toolset (no delegate, clarify, memory, send_message, or execute_code), run in quiet mode, and are limited to depth 1 by default (configurable up to 3 via delegation.max_spawn_depth). Up to 3 tasks execute concurrently via ThreadPoolExecutor.

Constraints

Max delegation depth: 1 by default (configurable up to 3)
Max concurrent tasks: 3
Blocked tools: delegate, clarify, memory, send_message, execute_code
Child runs with quiet_mode=True
Child skips context files and memory loading

Child Configuration

Toolset: intersection of parent tools minus blocked
skip_context_files=True
skip_memory=True
_delegate_depth = parent + 1
Registered in parent _active_children for interrupt propagation

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Sequence 7

Error Recovery and Fallback

When API calls fail, the agent cascades through eight recovery strategies before giving up. The fallback chain walks through configured alternative models and providers, swapping the client in-place. The primary runtime is automatically restored at the start of the next conversation turn.

Recovery Cascade

1. Surrogate sanitization (once)
2. Credential pool rotation
3. Provider-specific auth refresh
4. Context-length error: step down and compress
5. 413 payload too large: compress and retry
6. Rate limit 429: eager fallback
7. Non-retryable 4xx: try fallback then abort
8. Max retries: primary recovery then fallback then give up

Fallback Mechanics

_try_activate_fallback() walks the chain
Swaps model, provider, and client in-place
Updates compressor with new context limits
_restore_primary_runtime() called at next turn
Chain configured via fallback_model and fallback_providers

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Plugin Lifecycle Hooks. Plugins fire at specific points in the sequences above. See Data Model § Plugin System for the full 14-hook list and registration mechanism; the two cards below show where each category fires relative to the diagrams on this page.

Agent Loop Hooks

pre_llm_call — before each LLM API call
post_llm_call — after LLM response received
pre_api_request — before HTTP request to provider
post_api_request — after HTTP response
on_session_start / on_session_end
on_session_finalize / on_session_reset
on_memory_write — after memory tool writes
subagent_stop — when child agent finishes

Tool Execution Hooks

pre_tool_call — before each tool handler runs
post_tool_call — after tool returns result
transform_terminal_output — filter terminal output
transform_tool_result — modify any tool result