Hermes Agent

C4 System Architecture — v0.14.0 — Nous Research

Hermes is a Python-based AI agent framework that connects LLMs to the real world through 30+ tools, 27 messaging platforms, multiple execution backends, and a composable skill system. It runs as a CLI, a React terminal TUI, a web dashboard, a messaging gateway daemon, an ACP server for editors, and an MCP server for tool interop.

30+

Built-in Tools

Platform Adapters

30+

LLM Providers

Exec Backends

Memory Providers

Entry Points

Level 1 — System Context

Who interacts with Hermes?

Four actor types drive Hermes, and it depends on seven categories of external systems. The diagram below shows the complete system boundary.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Actors

Hermes System

External Systems

Level 2 — Container Diagram

Six entry points, one core engine

Hermes deploys in six modes — CLI, TUI (React/Ink), Web Dashboard, Gateway, ACP Server, and MCP Server — all driving the same AIAgent core. The core delegates to an intelligence layer, a tool system, persistent state, and a cron scheduler.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

CLI

Classic prompt_toolkit REPL. Streaming output, session picker, inline model switching.

TUI

React/Ink terminal UI (ui-tui/). Communicates with a Python JSON-RPC backend (tui_gateway/).

Web Dashboard

React 19 + Vite browser SPA (web/). Config, sessions, logs, analytics. Plugin-extensible tabs. Chat tab embeds the TUI by opening /api/pty on the FastAPI backend (hermes_cli/web_server.py), which spawns hermes --tui behind a POSIX pseudo-terminal; xterm.js renders the ANSI in-browser. A second /api/ws JSON-RPC sidecar carries structured metadata (tool-call sidebar, slash launcher, model badge).

Gateway

Long-running daemon. Bridges 22 messaging platforms. Manages sessions, cron, reconnection.

ACP Server

Agent Client Protocol for Cursor/VS Code/Zed. Sessions, prompts, fork, cancel, slash commands. Pre-execution edit approval guard (acp_adapter/edit_approval.py) bound via ContextVar gates write_file and patch; policies: ask, workspace_session, session.

MCP Server

Exposes gateway conversations as MCP tools. Claude Code, Cursor, Codex can consume them.

Level 3 — Core Agent Engine

The conversation loop

AIAgent.run_conversation() is the heart of the system — a multi-turn loop that calls the LLM, parses tool calls, executes them (optionally in parallel), checks context limits, and loops until the model produces a final text response. It orchestrates all agent intelligence components.

Conversation Loop

Build Prompt

Identity, persona,
skills, memory

→

Get Tools

Resolve toolsets,
filter, schema

→

Call LLM

Streaming via
OpenAI/Anthropic SDK

→

Parse Response

Text or tool_calls
+ native parsers

→

Guardrails

Loop detection,
duplicate filtering

→

Execute Tools

Parallel when safe,
safety analysis

→

Compress?

LLM-summarize if
near token limit

⟳

Prompt Builder

identity agent name and persona
soul personality / custom persona
skills available skill catalog
memory frozen snapshot at session start
context files .hermes.md, AGENTS.md
Security scan for injection in context

Context Compressor

LLM-based mid-conversation compression
Protect head messages (system prompt)
Protect tail by token budget (~20K)
Summarize middle into structured template
Iterative re-compression support
Uses cheap auxiliary model

Credential Pool

Multiple API keys per provider
Round-robin key rotation
Automatic retry on rate limits

Usage Pricing

Per-provider pricing tables
Real-time cost tracking
Token counting and limits

Iteration Budget

90 iterations default (parent)
Per-subagent independent budget
Thread-safe counter
execute_code iterations refunded

Tool Guardrails

Pure stateless controller (agent/tool_guardrails.py) that sits between response parsing and tool execution. Tracks per-turn observations and returns decisions — no side effects.

Detects repeated identical tool calls (idempotent loop detection)
Classifies tools as idempotent vs mutating
Fingerprints tool calls for duplicate filtering
Returns guidance/warnings, never blocks directly

Level 3 — Tool System

Registry, toolsets, and execution

Every tool self-registers into a singleton ToolRegistry at import time. model_tools.py discovers all tool modules, resolves composable toolsets, and provides the OpenAI-format schemas to the agent. Execution environments are pluggable backends for the terminal tool. See Data Model § Tool Registry for schemas and toolset composition.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Execution Environment Backends

Local

Direct subprocess

Docker

Container isolation

SSH

Remote execution

Modal

Cloud GPU compute

Daytona

Cloud dev envs

Singularity

HPC containers

Vercel

Cloud sandboxes

Subagent Architecture (delegate_tool)

Spawning

Parent AIAgent spawns children via ThreadPoolExecutor. Max 3 concurrent children. Max depth 1 by default (configurable up to 3 via delegation.max_spawn_depth).

Isolation

Each child gets its own conversation, task_id, restricted toolset. Cannot: delegate, clarify, memory, send_message, execute_code.

All 30+ Built-in Tools

Terminal & File

terminal execute shell commands
read_file read with line numbers
write_file create / overwrite files
search_files ripgrep-backed search
patch fuzzy find-and-replace
process manage background procs

Web & Browser

web_search multi-backend search
web_extract page content extraction
browser_navigate open URLs
browser_click interact with elements
browser_snapshot accessibility tree
browser_vision screenshot + AI analysis

Agent & Memory

memory persistent key-value store
session_search search past sessions
delegate_task spawn subagents
execute_code Python scripting
clarify ask user questions
todo task list management
cronjob schedule tasks
send_message cross-platform delivery

Skills

skill_view load skill content
skill_manage create/edit/delete
skills_list browse catalog

Media

vision_analyze image understanding
image_generate Fal.ai (default FLUX 2 Klein 9B)
text_to_speech multi-provider TTS

Integration

mcp_tool call external MCP servers
homeassistant smart home control
honcho_tools AI-native memory (plugin)

Level 3 — Gateway

27 platforms, one session layer

The Gateway daemon multiplexes across all messaging platforms. GatewayRunner manages platform adapter lifecycles, caches AIAgent instances per session for prefix-cache efficiency, and routes output via DeliveryRouter.

GatewayRunner

Platform adapter lifecycle via PlatformRegistry
AIAgent cache per session
Interrupt and approval handling
Background platform reconnection
Spawns cron tick every 60s

SessionStore

Conversation persistence to disk
Reset policy evaluation
Platform tagging per session
PII hashing (phone IDs)

DeliveryRouter

"origin" → back to source
"telegram:123" → explicit target
Home channel routing
Deduplication

Platform Adapters (BasePlatformAdapter + PlatformRegistry)

Telegram Discord Slack WhatsApp Signal Matrix BlueBubbles Mattermost Email SMS Home Assistant Webhook API Server DingTalk Feishu WeCom Weixin QQBot Yuanbao Telegram Network IRC Teams

Built-in adapters live in gateway/platforms/. Plugin adapters (Google Chat, IRC, Line, Simplex, Teams) live in plugins/platforms/ and register dynamically via PlatformRegistry.

State & Persistence

Pluggable memory, built-in state

SessionDB (SQLite)

WAL mode for concurrent readers
Schema v11 with migrations
sessions table — model, tokens, costs, title
messages table — role, content, tool_calls
messages_fts — FTS5 full-text search
Application-level retry with jitter
Compression-triggered session splitting

Built-in Memory (always on)

MEMORY.md — agent knowledge notes
USER.md — user profile and prefs
§-delimited entries, character-limited
Snapshot frozen at session start
Security scan on writes
Wrapped as builtin_memory_provider.py

Pluggable Memory Providers (MemoryProvider ABC)

One external provider active alongside built-in memory, managed by agent/memory_manager.py. Providers live in plugins/memory/<name>/ with lifecycle hooks (on_pre_compress(), on_delegation(), session-scoped session_id). See Data Model § Memory Providers for the class hierarchy and interface contract.

Honcho OpenViking Mem0 Hindsight Holographic RetainDB ByteRover Supermemory

Kanban Board (multi-agent task coordination)

Separate kanban.db SQLite database — profile-agnostic, shared across all Hermes profiles on one machine. CAS-based claim protocol with 15-min TTL and heartbeat extension. The dispatcher (embedded in gateway) polls ready tasks, spawns worker agents with kanban_tools (gated by HERMES_KANBAN_TASK env var — invisible in normal sessions). Dashboard plugin provides a FastAPI router + WebSocket event stream for live board updates. See Data Model § Entity Map for the full schema.

Entities

tasks — title, assignee, status (7 states), priority, workspace
task_runs — attempt history with heartbeat and outcome
task_events — append-only audit log (tailed by WebSocket)
task_comments — threaded discussion per task
task_links — parent-child DAG for subtasks

Surfaces

hermes kanban CLI — human board management
/kanban slash command — gateway access
Dashboard plugin — web UI tab with live updates
Dispatcher — embedded in gateway, spawns worker agents

Skills System

YAML frontmatter + Markdown body
Categories, templates, scripts, references
Skills Hub for community sharing
Optional skills directory

Configuration

config.yaml — main settings + memory provider selection
.env — API keys (0600 perms)
auth.json — credential pool state
hermes memory setup — interactive provider wizard
hermes memory status / hermes memory off

Batch Runner

Parallel batch processing via multiprocessing.Pool
Dataset loading (JSONL) with configurable batch size
Checkpointing for fault tolerance and resumption
Trajectory saving + tool usage aggregation
Toolset distribution selection per run

Localization

locales/ — 16 language YAML catalogs
Scope: CLI approval prompts + select gateway slash replies
Agent output, logs, tool results stay English
Parity enforced by tests/agent/test_i18n.py

Key Data Flows

How messages travel through the system

Interactive CLI Session

User types

CLI / TUI /
Web / Gateway

→

AIAgent

run_conversation()

→

LLM Call

streaming
response

→

Tool Exec

parallel when
safe

⟳

Response

stream to
terminal

→

Save

SessionDB

Gateway Message Flow

Platform Msg

Telegram/
Discord/...

→

Adapter

receive +
normalize

→

SessionStore

load / create

→

AIAgent

run_conversation()

→

Deliver

DeliveryRouter
→ Adapter.send()

Cron Job Execution

60s Tick

Gateway daemon
background thread

→

File Lock

.tick.lock
prevents races

→

Scan Jobs

jobs.json
find due

→

Spawn Agent

isolated session
+ skill prompts

→

Deliver

target platform
or local

Auxiliary Model System. Side tasks use separate model configs, not the main conversation model. See Data Model § Config Hierarchy for the full task list and per-task timeouts. Default fallback: google/gemini-3-flash-preview via OpenRouter. Each task is independently configurable in config.yaml.

Technology Stack

What powers Hermes under the hood

Core Runtime

Python ≥3.11 — primary language
openai + anthropic SDK — LLM clients
httpx — async HTTP client
prompt_toolkit — classic CLI
React + Ink — TUI (TypeScript)
React 19 + Vite — Web dashboard
SQLite WAL + FTS5 — state persistence
YAML + .env — configuration

Protocols

ACP — Agent Client Protocol (editor integration)
MCP — Model Context Protocol (tool interop)
OpenAI API — standard tool calling format
Anthropic API — native adapter with caching

Messaging SDKs

python-telegram-bot
discord.py (with voice)
slack-bolt + slack-sdk
matrix-nio (E2E encryption)
dingtalk-stream, lark-oapi

Media & Cloud

Fal.ai — image generation
ElevenLabs / edge-tts / OpenAI TTS
faster-whisper — local speech-to-text
Firecrawl, Exa, Tavily — web backends
Modal, Daytona — cloud compute