Hermes Agent

C4 System Architecture — v0.14.0 — Nous Research

Hermes is a Python-based AI agent framework that connects LLMs to the real world through 30+ tools, 27 messaging platforms, multiple execution backends, and a composable skill system. It runs as a CLI, a React terminal TUI, a web dashboard, a messaging gateway daemon, an ACP server for editors, and an MCP server for tool interop.

30+
Built-in Tools
27
Platform Adapters
30+
LLM Providers
8
Exec Backends
8
Memory Providers
6
Entry Points
Level 1 — System Context

Who interacts with Hermes?

Four actor types drive Hermes, and it depends on seven categories of external systems. The diagram below shows the complete system boundary.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Actors
Hermes System
External Systems
Level 2 — Container Diagram

Six entry points, one core engine

Hermes deploys in six modes — CLI, TUI (React/Ink), Web Dashboard, Gateway, ACP Server, and MCP Server — all driving the same AIAgent core. The core delegates to an intelligence layer, a tool system, persistent state, and a cron scheduler.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
CLI
Classic prompt_toolkit REPL. Streaming output, session picker, inline model switching.
TUI
React/Ink terminal UI (ui-tui/). Communicates with a Python JSON-RPC backend (tui_gateway/).
Web Dashboard
React 19 + Vite browser SPA (web/). Config, sessions, logs, analytics. Plugin-extensible tabs. Chat tab embeds the TUI by opening /api/pty on the FastAPI backend (hermes_cli/web_server.py), which spawns hermes --tui behind a POSIX pseudo-terminal; xterm.js renders the ANSI in-browser. A second /api/ws JSON-RPC sidecar carries structured metadata (tool-call sidebar, slash launcher, model badge).
Gateway
Long-running daemon. Bridges 22 messaging platforms. Manages sessions, cron, reconnection.
ACP Server
Agent Client Protocol for Cursor/VS Code/Zed. Sessions, prompts, fork, cancel, slash commands. Pre-execution edit approval guard (acp_adapter/edit_approval.py) bound via ContextVar gates write_file and patch; policies: ask, workspace_session, session.
MCP Server
Exposes gateway conversations as MCP tools. Claude Code, Cursor, Codex can consume them.
Level 3 — Core Agent Engine

The conversation loop

AIAgent.run_conversation() is the heart of the system — a multi-turn loop that calls the LLM, parses tool calls, executes them (optionally in parallel), checks context limits, and loops until the model produces a final text response. It orchestrates all agent intelligence components.

Conversation Loop
Build Prompt
Identity, persona,
skills, memory
Get Tools
Resolve toolsets,
filter, schema
Call LLM
Streaming via
OpenAI/Anthropic SDK
Parse Response
Text or tool_calls
+ native parsers
Guardrails
Loop detection,
duplicate filtering
Execute Tools
Parallel when safe,
safety analysis
Compress?
LLM-summarize if
near token limit
Prompt Builder
  • identity agent name and persona
  • soul personality / custom persona
  • skills available skill catalog
  • memory frozen snapshot at session start
  • context files .hermes.md, AGENTS.md
  • Security scan for injection in context
Context Compressor
  • LLM-based mid-conversation compression
  • Protect head messages (system prompt)
  • Protect tail by token budget (~20K)
  • Summarize middle into structured template
  • Iterative re-compression support
  • Uses cheap auxiliary model
Credential Pool
  • Multiple API keys per provider
  • Round-robin key rotation
  • Automatic retry on rate limits
Usage Pricing
  • Per-provider pricing tables
  • Real-time cost tracking
  • Token counting and limits
Iteration Budget
  • 90 iterations default (parent)
  • Per-subagent independent budget
  • Thread-safe counter
  • execute_code iterations refunded
Tool Guardrails

Pure stateless controller (agent/tool_guardrails.py) that sits between response parsing and tool execution. Tracks per-turn observations and returns decisions — no side effects.

  • Detects repeated identical tool calls (idempotent loop detection)
  • Classifies tools as idempotent vs mutating
  • Fingerprints tool calls for duplicate filtering
  • Returns guidance/warnings, never blocks directly
Level 3 — Tool System

Registry, toolsets, and execution

Every tool self-registers into a singleton ToolRegistry at import time. model_tools.py discovers all tool modules, resolves composable toolsets, and provides the OpenAI-format schemas to the agent. Execution environments are pluggable backends for the terminal tool. See Data Model § Tool Registry for schemas and toolset composition.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Execution Environment Backends
Local
Direct subprocess
|
Docker
Container isolation
|
SSH
Remote execution
|
Modal
Cloud GPU compute
|
Daytona
Cloud dev envs
|
Singularity
HPC containers
|
Vercel
Cloud sandboxes
Subagent Architecture (delegate_tool)
Spawning
Parent AIAgent spawns children via ThreadPoolExecutor. Max 3 concurrent children. Max depth 1 by default (configurable up to 3 via delegation.max_spawn_depth).
Isolation
Each child gets its own conversation, task_id, restricted toolset. Cannot: delegate, clarify, memory, send_message, execute_code.
All 30+ Built-in Tools
Terminal & File
  • terminal execute shell commands
  • read_file read with line numbers
  • write_file create / overwrite files
  • search_files ripgrep-backed search
  • patch fuzzy find-and-replace
  • process manage background procs
Web & Browser
  • web_search multi-backend search
  • web_extract page content extraction
  • browser_navigate open URLs
  • browser_click interact with elements
  • browser_snapshot accessibility tree
  • browser_vision screenshot + AI analysis
Agent & Memory
  • memory persistent key-value store
  • session_search search past sessions
  • delegate_task spawn subagents
  • execute_code Python scripting
  • clarify ask user questions
  • todo task list management
  • cronjob schedule tasks
  • send_message cross-platform delivery
Skills
  • skill_view load skill content
  • skill_manage create/edit/delete
  • skills_list browse catalog
Media
  • vision_analyze image understanding
  • image_generate Fal.ai (default FLUX 2 Klein 9B)
  • text_to_speech multi-provider TTS
Integration
  • mcp_tool call external MCP servers
  • homeassistant smart home control
  • honcho_tools AI-native memory (plugin)
Level 3 — Gateway

27 platforms, one session layer

The Gateway daemon multiplexes across all messaging platforms. GatewayRunner manages platform adapter lifecycles, caches AIAgent instances per session for prefix-cache efficiency, and routes output via DeliveryRouter.

GatewayRunner
  • Platform adapter lifecycle via PlatformRegistry
  • AIAgent cache per session
  • Interrupt and approval handling
  • Background platform reconnection
  • Spawns cron tick every 60s
SessionStore
  • Conversation persistence to disk
  • Reset policy evaluation
  • Platform tagging per session
  • PII hashing (phone IDs)
DeliveryRouter
  • "origin" → back to source
  • "telegram:123" → explicit target
  • Home channel routing
  • Deduplication
Platform Adapters (BasePlatformAdapter + PlatformRegistry)
Telegram Discord Slack WhatsApp Signal Matrix BlueBubbles Mattermost Email SMS Home Assistant Webhook API Server DingTalk Feishu WeCom Weixin QQBot Yuanbao Telegram Network IRC Teams

Built-in adapters live in gateway/platforms/. Plugin adapters (Google Chat, IRC, Line, Simplex, Teams) live in plugins/platforms/ and register dynamically via PlatformRegistry.

State & Persistence

Pluggable memory, built-in state

SessionDB (SQLite)
  • WAL mode for concurrent readers
  • Schema v11 with migrations
  • sessions table — model, tokens, costs, title
  • messages table — role, content, tool_calls
  • messages_fts — FTS5 full-text search
  • Application-level retry with jitter
  • Compression-triggered session splitting
Built-in Memory (always on)
  • MEMORY.md — agent knowledge notes
  • USER.md — user profile and prefs
  • §-delimited entries, character-limited
  • Snapshot frozen at session start
  • Security scan on writes
  • Wrapped as builtin_memory_provider.py
Pluggable Memory Providers (MemoryProvider ABC)

One external provider active alongside built-in memory, managed by agent/memory_manager.py. Providers live in plugins/memory/<name>/ with lifecycle hooks (on_pre_compress(), on_delegation(), session-scoped session_id). See Data Model § Memory Providers for the class hierarchy and interface contract.

Honcho OpenViking Mem0 Hindsight Holographic RetainDB ByteRover Supermemory
Kanban Board (multi-agent task coordination)

Separate kanban.db SQLite database — profile-agnostic, shared across all Hermes profiles on one machine. CAS-based claim protocol with 15-min TTL and heartbeat extension. The dispatcher (embedded in gateway) polls ready tasks, spawns worker agents with kanban_tools (gated by HERMES_KANBAN_TASK env var — invisible in normal sessions). Dashboard plugin provides a FastAPI router + WebSocket event stream for live board updates. See Data Model § Entity Map for the full schema.

Entities
  • tasks — title, assignee, status (7 states), priority, workspace
  • task_runs — attempt history with heartbeat and outcome
  • task_events — append-only audit log (tailed by WebSocket)
  • task_comments — threaded discussion per task
  • task_links — parent-child DAG for subtasks
Surfaces
  • hermes kanban CLI — human board management
  • /kanban slash command — gateway access
  • Dashboard plugin — web UI tab with live updates
  • Dispatcher — embedded in gateway, spawns worker agents
Skills System
  • YAML frontmatter + Markdown body
  • Categories, templates, scripts, references
  • Skills Hub for community sharing
  • Optional skills directory
Configuration
  • config.yaml — main settings + memory provider selection
  • .env — API keys (0600 perms)
  • auth.json — credential pool state
  • hermes memory setup — interactive provider wizard
  • hermes memory status / hermes memory off
Batch Runner
  • Parallel batch processing via multiprocessing.Pool
  • Dataset loading (JSONL) with configurable batch size
  • Checkpointing for fault tolerance and resumption
  • Trajectory saving + tool usage aggregation
  • Toolset distribution selection per run
Localization
  • locales/ — 16 language YAML catalogs
  • Scope: CLI approval prompts + select gateway slash replies
  • Agent output, logs, tool results stay English
  • Parity enforced by tests/agent/test_i18n.py
Key Data Flows

How messages travel through the system

Interactive CLI Session
User types
CLI / TUI /
Web / Gateway
AIAgent
run_conversation()
LLM Call
streaming
response
Tool Exec
parallel when
safe
Response
stream to
terminal
Save
SessionDB
Gateway Message Flow
Platform Msg
Telegram/
Discord/...
Adapter
receive +
normalize
SessionStore
load / create
AIAgent
run_conversation()
Deliver
DeliveryRouter
→ Adapter.send()
Cron Job Execution
60s Tick
Gateway daemon
background thread
File Lock
.tick.lock
prevents races
Scan Jobs
jobs.json
find due
Spawn Agent
isolated session
+ skill prompts
Deliver
target platform
or local
Auxiliary Model System. Side tasks use separate model configs, not the main conversation model. See Data Model § Config Hierarchy for the full task list and per-task timeouts. Default fallback: google/gemini-3-flash-preview via OpenRouter. Each task is independently configurable in config.yaml.
Technology Stack

What powers Hermes under the hood

Core Runtime
  • Python ≥3.11 — primary language
  • openai + anthropic SDK — LLM clients
  • httpx — async HTTP client
  • prompt_toolkit — classic CLI
  • React + Ink — TUI (TypeScript)
  • React 19 + Vite — Web dashboard
  • SQLite WAL + FTS5 — state persistence
  • YAML + .env — configuration
Protocols
  • ACP — Agent Client Protocol (editor integration)
  • MCP — Model Context Protocol (tool interop)
  • OpenAI API — standard tool calling format
  • Anthropic API — native adapter with caching
Messaging SDKs
  • python-telegram-bot
  • discord.py (with voice)
  • slack-bolt + slack-sdk
  • matrix-nio (E2E encryption)
  • dingtalk-stream, lark-oapi
Media & Cloud
  • Fal.ai — image generation
  • ElevenLabs / edge-tts / OpenAI TTS
  • faster-whisper — local speech-to-text
  • Firecrawl, Exa, Tavily — web backends
  • Modal, Daytona — cloud compute