Hermes Agent
C4 System Architecture — v0.14.0 — Nous Research
Hermes is a Python-based AI agent framework that connects LLMs to the real world through 30+ tools, 27 messaging platforms, multiple execution backends, and a composable skill system. It runs as a CLI, a React terminal TUI, a web dashboard, a messaging gateway daemon, an ACP server for editors, and an MCP server for tool interop.
Who interacts with Hermes?
Four actor types drive Hermes, and it depends on seven categories of external systems. The diagram below shows the complete system boundary.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Six entry points, one core engine
Hermes deploys in six modes — CLI, TUI (React/Ink), Web Dashboard, Gateway, ACP Server, and MCP Server — all driving the same AIAgent core. The core delegates to an intelligence layer, a tool system, persistent state, and a cron scheduler.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
prompt_toolkit REPL. Streaming output,
session picker, inline model switching.
ui-tui/). Communicates with a
Python JSON-RPC backend (tui_gateway/).
web/). Config, sessions,
logs, analytics. Plugin-extensible tabs. Chat tab embeds the TUI by
opening /api/pty on the FastAPI backend
(hermes_cli/web_server.py), which spawns
hermes --tui behind a POSIX pseudo-terminal; xterm.js renders
the ANSI in-browser. A second /api/ws JSON-RPC sidecar carries
structured metadata (tool-call sidebar, slash launcher, model badge).
acp_adapter/edit_approval.py)
bound via ContextVar gates write_file and patch;
policies: ask, workspace_session, session.
The conversation loop
AIAgent.run_conversation() is the heart of the system — a multi-turn loop
that calls the LLM, parses tool calls, executes them (optionally in parallel), checks
context limits, and loops until the model produces a final text response. It orchestrates
all agent intelligence components.
skills, memory
filter, schema
OpenAI/Anthropic SDK
+ native parsers
duplicate filtering
safety analysis
near token limit
identityagent name and personasoulpersonality / custom personaskillsavailable skill catalogmemoryfrozen snapshot at session startcontext files.hermes.md, AGENTS.md- Security scan for injection in context
- LLM-based mid-conversation compression
- Protect head messages (system prompt)
- Protect tail by token budget (~20K)
- Summarize middle into structured template
- Iterative re-compression support
- Uses cheap auxiliary model
- Multiple API keys per provider
- Round-robin key rotation
- Automatic retry on rate limits
- Per-provider pricing tables
- Real-time cost tracking
- Token counting and limits
- 90 iterations default (parent)
- Per-subagent independent budget
- Thread-safe counter
execute_codeiterations refunded
Pure stateless controller (agent/tool_guardrails.py) that
sits between response parsing and tool execution. Tracks per-turn observations
and returns decisions — no side effects.
- Detects repeated identical tool calls (idempotent loop detection)
- Classifies tools as idempotent vs mutating
- Fingerprints tool calls for duplicate filtering
- Returns guidance/warnings, never blocks directly
Registry, toolsets, and execution
Every tool self-registers into a singleton ToolRegistry at import time.
model_tools.py discovers all tool modules, resolves composable toolsets,
and provides the OpenAI-format schemas to the agent. Execution environments are
pluggable backends for the terminal tool.
See Data Model § Tool Registry
for schemas and toolset composition.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
ThreadPoolExecutor.
Max 3 concurrent children. Max depth 1 by default
(configurable up to 3 via delegation.max_spawn_depth).
delegate, clarify, memory, send_message, execute_code.
All 30+ Built-in Tools
terminalexecute shell commandsread_fileread with line numberswrite_filecreate / overwrite filessearch_filesripgrep-backed searchpatchfuzzy find-and-replaceprocessmanage background procs
web_searchmulti-backend searchweb_extractpage content extractionbrowser_navigateopen URLsbrowser_clickinteract with elementsbrowser_snapshotaccessibility treebrowser_visionscreenshot + AI analysis
memorypersistent key-value storesession_searchsearch past sessionsdelegate_taskspawn subagentsexecute_codePython scriptingclarifyask user questionstodotask list managementcronjobschedule taskssend_messagecross-platform delivery
skill_viewload skill contentskill_managecreate/edit/deleteskills_listbrowse catalog
vision_analyzeimage understandingimage_generateFal.ai (default FLUX 2 Klein 9B)text_to_speechmulti-provider TTS
mcp_toolcall external MCP servershomeassistantsmart home controlhoncho_toolsAI-native memory (plugin)
27 platforms, one session layer
The Gateway daemon multiplexes across all messaging platforms. GatewayRunner
manages platform adapter lifecycles, caches AIAgent instances per session for prefix-cache
efficiency, and routes output via DeliveryRouter.
- Platform adapter lifecycle via
PlatformRegistry - AIAgent cache per session
- Interrupt and approval handling
- Background platform reconnection
- Spawns cron tick every 60s
- Conversation persistence to disk
- Reset policy evaluation
- Platform tagging per session
- PII hashing (phone IDs)
"origin"→ back to source"telegram:123"→ explicit target- Home channel routing
- Deduplication
Built-in adapters live in gateway/platforms/. Plugin adapters
(Google Chat, IRC, Line, Simplex, Teams) live in plugins/platforms/
and register dynamically via PlatformRegistry.
Pluggable memory, built-in state
- WAL mode for concurrent readers
- Schema v11 with migrations
sessionstable — model, tokens, costs, titlemessagestable — role, content, tool_callsmessages_fts— FTS5 full-text search- Application-level retry with jitter
- Compression-triggered session splitting
MEMORY.md— agent knowledge notesUSER.md— user profile and prefs- §-delimited entries, character-limited
- Snapshot frozen at session start
- Security scan on writes
- Wrapped as
builtin_memory_provider.py
One external provider active alongside built-in memory, managed by agent/memory_manager.py.
Providers live in plugins/memory/<name>/ with lifecycle hooks
(on_pre_compress(), on_delegation(), session-scoped session_id).
See Data Model § Memory Providers
for the class hierarchy and interface contract.
Separate kanban.db SQLite database — profile-agnostic, shared
across all Hermes profiles on one machine. CAS-based claim protocol with 15-min TTL
and heartbeat extension. The dispatcher (embedded in gateway) polls ready tasks,
spawns worker agents with kanban_tools (gated by
HERMES_KANBAN_TASK env var — invisible in normal sessions). Dashboard plugin
provides a FastAPI router + WebSocket event stream for live board updates.
See Data Model § Entity Map
for the full schema.
tasks— title, assignee, status (7 states), priority, workspacetask_runs— attempt history with heartbeat and outcometask_events— append-only audit log (tailed by WebSocket)task_comments— threaded discussion per tasktask_links— parent-child DAG for subtasks
hermes kanbanCLI — human board management/kanbanslash command — gateway accessDashboard plugin— web UI tab with live updates- Dispatcher — embedded in gateway, spawns worker agents
- YAML frontmatter + Markdown body
- Categories, templates, scripts, references
- Skills Hub for community sharing
- Optional skills directory
config.yaml— main settings + memory provider selection.env— API keys (0600 perms)auth.json— credential pool statehermes memory setup— interactive provider wizardhermes memory status/hermes memory off
- Parallel batch processing via
multiprocessing.Pool - Dataset loading (JSONL) with configurable batch size
- Checkpointing for fault tolerance and resumption
- Trajectory saving + tool usage aggregation
- Toolset distribution selection per run
locales/— 16 language YAML catalogs- Scope: CLI approval prompts + select gateway slash replies
- Agent output, logs, tool results stay English
- Parity enforced by
tests/agent/test_i18n.py
How messages travel through the system
Web / Gateway
response
safe
terminal
Discord/...
normalize
→ Adapter.send()
background thread
prevents races
find due
+ skill prompts
or local
google/gemini-3-flash-preview via OpenRouter.
Each task is independently configurable in config.yaml.
What powers Hermes under the hood
Python ≥3.11— primary languageopenai+anthropicSDK — LLM clientshttpx— async HTTP clientprompt_toolkit— classic CLIReact+Ink— TUI (TypeScript)React 19+Vite— Web dashboardSQLiteWAL + FTS5 — state persistenceYAML+.env— configuration
ACP— Agent Client Protocol (editor integration)MCP— Model Context Protocol (tool interop)OpenAI API— standard tool calling formatAnthropic API— native adapter with caching
python-telegram-botdiscord.py(with voice)slack-bolt+slack-sdkmatrix-nio(E2E encryption)dingtalk-stream,lark-oapi
Fal.ai— image generationElevenLabs/edge-tts/OpenAI TTSfaster-whisper— local speech-to-textFirecrawl,Exa,Tavily— web backendsModal,Daytona— cloud compute