Hermes Agent

Diátaxis Documentation Map — v0.14.0 — Nous Research

The C4 diagrams show what Hermes is built from and how parts connect. This page fills in the rest: why the architecture is shaped this way, how to extend it, the mental model a newcomer needs, and the reference surfaces for daily work. It follows the Diátaxis framework — four documentation quadrants, each serving a different human need.

The Diátaxis Framework

Four kinds of documentation, four human needs

Diátaxis identifies four modes of documentation that shouldn't be mixed. Each has a different relationship to action vs. knowledge and learning vs. working. Hermes needs all four.

Practical + Learning
Tutorials
Learning-oriented. Walk a newcomer through Hermes so they build a working mental model. Like a cooking lesson with a child.
"Walk me through this system"
Practical + Working
How-to Guides
Goal-oriented. Show how to accomplish specific tasks: add a tool, build a platform adapter, write a plugin. Like a recipe.
"How do I extend this?"
Theoretical + Learning
Explanation
Understanding-oriented. Why the architecture is shaped this way. Design decisions, trade-offs, constraints. Readable in the bath.
"Why is it designed this way?"
Theoretical + Working
Reference
Information-oriented. Austere, factual, complete. Module maps, config keys, API surfaces. Like food packaging labels.
"Give me the facts"
Where C4 fits. The C4 architecture page you already have is predominantly Reference (component maps, data flows, tech stack) with some Explanation (the "Key Data Flows" section). This page fills the gaps — especially the deep Explanation and How-to quadrants that component diagrams can't capture.
Tutorial — The Mental Model

Think of Hermes as a switchboard operator

Before diving into modules and code, build this picture in your head — a message comes in, the Agent thinks and acts in a loop, and an answer goes back out:

1. A message arrives
CLI, Telegram, Slack, editor plugin, or cron — all normalized into one shape and handed to the Agent.
2. The Agent thinks
Assembles context (user, skills, memory), builds a prompt, and streams from an LLM.
3. The Agent acts
If the LLM asks for a tool call, run it and feed the result back — think/act loops until a final answer.
4. The answer goes back
Routed to wherever the message came from — Telegram replies in Telegram, CLI streams to stdout.

That's the entire system. Everything else — the 30+ tools, 27 platform adapters, credential pool, context compressor, cron scheduler — is supporting infrastructure around this single loop.

The three concentric rings

Ring 1: Entry Points

CLI, TUI, Web Dashboard, Gateway, ACP, MCP Server. These accept messages from different worlds and normalize them. They are thin — as little logic as possible.

Ring 2: The Agent Core

AIAgent + its intelligence modules. This is where decisions are made: which model to call, how to build the prompt, when to compress, when to stop. A MemoryManager orchestrates pluggable memory providers alongside the built-in store.

Ring 3: The Tool Surface

30+ tools that the Agent can invoke. Terminal, files, web, browser, memory, delegation. Each is self-contained; the Agent doesn't know their internals.

The key insight for newcomers: Hermes is NOT a monolith with 27 messaging integrations. It's a single agent loop with a pluggable front door (platforms) and a pluggable toolkit (tools). Everything else is glue.
How-to — Extension Points

How to extend Hermes

Hermes has six well-defined extension surfaces. For each one: what it is, when to use it, and what's involved.

Add a New Tool (2 files)

When: You need the agent to do something that requires end-to-end API key integration, custom processing logic that must execute precisely, or binary data / streaming / real-time events.

Important: If it can be expressed as instructions + shell commands, make it a skill instead. The answer is almost always skill, not tool.

  • Create tools/your_tool.py with schema + handler + registry.register()
  • That's it for discovery — discover_builtin_tools() in tools/registry.py auto-imports every tools/*.py at startup
  • Add the tool name to toolsets.py (_HERMES_CORE_TOOLS for default availability, or a new TOOLSETS entry for a niche group)
  • Provide a check_fn for runtime availability gating (env vars, optional deps)
Add a Platform Adapter (1 class or plugin)

When: You want Hermes to speak a new messaging platform (e.g., Zulip, Teams, a custom webhook format).

Interface: Subclass BasePlatformAdapter and implement three abstract methods. Can live in gateway/platforms/ (built-in) or as a platform plugin that self-registers via PlatformRegistry.

  • Required: connect() → bool
  • Required: disconnect() → None
  • Required: send(chat_id, content, ...) → SendResult
  • Optional: send_image, send_voice, send_document, edit_message
  • Incoming messages emit MessageEvent (normalized format)
  • Plugin path: plugins/platforms/<name>/ with plugin.yaml (kind: platform)
Add an Execution Backend (1 class)

When: You want the terminal tool to execute commands somewhere new (e.g., Kubernetes pods, AWS Lambda, a custom sandbox).

Interface: Subclass BaseEnvironment. Two methods.

  • Required: execute(command, cwd, timeout) → dict with output + returncode
  • Required: cleanup() → release resources
  • Shared helpers: _prepare_command(), _timeout_result()
  • Register via TERMINAL_ENV environment variable
Write a Memory Provider (MemoryProvider ABC)

When: You want to plug in a new long-term memory backend (e.g., a vector database, a knowledge graph, a custom cloud service).

Interface: Subclass MemoryProvider ABC from agent/memory_provider.py. Implement lifecycle hooks. Drop into plugins/memory/<name>/.

  • Provide plugin.yaml + __init__.py with register()
  • Lifecycle hooks: session start/end, pre/post tool calls
  • on_pre_compress() → inject insights into summaries
  • on_delegation() → notified of subagent tasks
  • session_id on all hooks for concurrent scoping
  • Prefetch is cached once per tool loop (latency-safe)
  • 8 shipped providers as reference implementations
Write a Plugin (plugin.yaml + register)
  • Drop folder in ~/.hermes/plugins/<name>/
  • Provide plugin.yaml manifest + __init__.py
  • Implement register(ctx: PluginContext)
  • ctx.register_tool() — add custom tools
  • ctx.register_hook() — pre/post tool/LLM calls
  • ctx.inject_message() — inject into conversation
  • Hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end
Create a Skill (SKILL.md)
  • Create ~/.hermes/skills/<category>/<name>/SKILL.md
  • YAML frontmatter + Markdown instructions
  • Can include references/, templates/, scripts/
  • No code needed — just instructions the agent follows
  • Prefer skills over tools for anything expressible as instructions + shell commands
  • Skills Hub available for community sharing
Reference — Module Map

Where everything lives

Factual, austere, meant to be consulted. The structure of the documentation mirrors the structure of the code.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Configuration Reference (config.yaml)

Core Settings

  • model — default LLM model identifier
  • provider — LLM provider name
  • base_url — custom API endpoint
  • api_key — API key (prefer .env)
  • personality — SOUL.md persona file
  • max_iterations — agent loop budget (default: 90)
  • reasoning_effort — xhigh/high/medium/low/none

Terminal Settings

  • terminal.backend — local/docker/ssh/modal/daytona/singularity
  • terminal.timeout — command timeout seconds
  • terminal.docker_image — container image
  • terminal.container_persistent — survive sessions
  • terminal.env_passthrough — env vars to forward

Compression

  • compression.enabled — auto-compress on/off
  • compression.threshold_percent — trigger at % of context
  • compression.protect_last_n — messages to protect
  • compression.summary_model — override auxiliary model

Auxiliary Models

  • auxiliary.vision — image analysis model
  • auxiliary.web_extract — content extraction model
  • auxiliary.compression — summarization model
  • auxiliary.approval — command safety check model
  • Each has: model, provider, base_url, api_key, timeout
File System Layout (~/.hermes/)
~/.hermes/ # Configuration & secrets ├── config.yaml — main configuration (YAML) ├── .env — API keys (0600 perms) ├── auth.json — credential pool state, OAuth tokens ├── SOUL.md — agent persona override # Persistent state (SQLite) ├── state.db — SessionDB: sessions, messages, FTS5 ├── kanban.db — Kanban board (profile-agnostic) # Memory (built-in provider) ├── memories/ │ ├── MEMORY.md — agent knowledge store (§-delimited) │ └── USER.md — user profile store (§-delimited) # Skills & plugins (user-installed) ├── skills/ — installed skills (SKILL.md files) ├── plugins/ — user plugins (if any) # Cron jobs ├── cron/ │ ├── jobs.json — scheduled job definitions │ ├── output/ — job execution logs │ └── .tick.lock — file lock for tick() (fcntl/msvcrt) ├── scripts/ — cron pre-run scripts # Profiles & sessions ├── profiles/ — multi-instance profiles (each has its own subtree) ├── sessions/ — per-session transcripts (gateway) # Runtime caches & logs ├── checkpoints/ — batch_runner resume state ├── logs/ — agent.log, errors.log, gateway.log ├── image_cache/ — images returned by tools ├── pastes/ — large-paste promotion cache ├── sandboxes/ — per-session sandboxes (when in use) └── hermes-agent/ — the codebase itself (when bundled this way)
Explanation — Design Rationale

Why the architecture is shaped this way

Every architectural choice has a trade-off. These are the ones that matter most for understanding Hermes — the decisions that would confuse you if you didn't know the reasoning.

Without explanation, the practitioner's knowledge of their craft is loose and fragmented, and their exercise of it is anxious.

— Diátaxis, on the value of explanation
Key Design Decisions
1
Prompt cache preservation is a first-class architectural concern
Anthropic's prompt caching charges dramatically less for cache hits than misses, so Hermes goes to unusual lengths to preserve the stable prefix: Memory injects a frozen snapshot at session start and never updates the system prompt mid-session; Honcho appends to user messages instead of the system prompt; context compression is the only operation allowed to modify past context. The codebase explicitly warns: don't alter past context, change toolsets, reload memories, or rebuild system prompts mid-conversation.
2
Skills over tools — almost always
A recurring architectural question: "Should this capability be a tool or a skill?" Hermes draws a clear line. Tools are for: end-to-end API key integration, custom processing logic that must execute precisely, binary data / streaming / real-time events. Everything else — anything expressible as instructions + shell commands — should be a skill. Skills are procedural memory (Markdown files). Tools are code. This keeps the codebase small and the extension surface approachable.
3
Safety-first parallel execution
When the LLM returns multiple tool calls, the agent analyzes whether they can run in parallel. It maintains three lists: _PARALLEL_SAFE_TOOLS (read-only tools), _NEVER_PARALLEL_TOOLS (interactive tools like clarify), and _PATH_SCOPED_TOOLS (safe only if targeting different file paths). The key trade-off: if ANY tool in a batch is unknown or has unparseable args, the entire batch falls back to sequential. Safety over speed, always.
4
Callback-based platform abstraction
AIAgent takes ~10 callback parameters (stream_callback, tool_progress, thinking, clarify, step, status, …) which makes it completely platform-agnostic — the CLI injects terminal spinners, the Gateway injects platform sending, ACP injects editor responses. Same agent, different shells. The trade-off: AIAgent.__init__ has 50+ parameters — a “god class,” but cohesive around one conversation lifecycle.
5
Self-registering tools eliminate the manifest
Each tool file calls registry.register() at module import time. The registry (tools/registry.py) has zero imports from tool files — preventing circular dependencies. Discovery happens when model_tools.py imports all tool modules. This means adding a tool requires only creating the file, importing it, and declaring its toolset. No central manifest to keep in sync. The trade-off: tool registration is a side effect of importing, which is unusual in Python.
6
Pragmatic heuristics over perfect analysis
Destructive command detection (_is_destructive_command()) uses regex heuristics for rm, mv, dd, git reset --hard. Path overlap detection uses prefix comparison rather than inode resolution (because "the file may not exist yet"). Context compression uses character limits, not token counts (because "char counts are model-independent"). These are all conscious simplicity choices that trade precision for reliability and portability.
7
Subagents are deliberately restricted
Delegated children cannot: delegate_task (prevents fork bombs), clarify (would confuse the user — they didn't talk to this agent), memory (prevents shared state corruption), send_message (no cross-platform side effects), execute_code (children should reason step-by-step). Max depth is 1 by default (configurable up to 3 via delegation.max_spawn_depth). Max concurrent children is 3 by default. The parent only sees the delegation call + summary result, never intermediate tool calls — information hiding for context efficiency.
8
Generous credential exhaustion cooldowns
When an API key hits an auth error (HTTP 401), it cools down for 5 minutes so transient token refreshes recover quickly. Rate limits (HTTP 429) and billing errors (HTTP 402) cool down for 1 hour — intentionally generous to avoid hammering providers at the cost of potential under-utilization. The credential pool supports 4 selection strategies (fill_first, round_robin, random, least_used), with fill_first as default because it maximizes provider-side cache hits.
9
Pluggable memory replaces hardcoded Honcho
PR #4623 extracted the ~1,000-line honcho_integration/ subsystem into a pluggable provider interfaceMemoryProvider ABC in agent/memory_provider.py, orchestrated by agent/memory_manager.py. Honcho became just one of 8 plugins in plugins/memory/, matching the same ABC-plus-plugins pattern used for platform adapters and execution backends. Key design choices:
  • Built-in memory (MEMORY.md / USER.md) is always on alongside the external provider.
  • prefetch_all() is cached once before the tool loop to avoid 10x latency spikes; sync happens non-blocking on threads.
  • New hooks on_pre_compress() and on_delegation() let providers inject insights into compression summaries and subagent results.
  • Cron and gateway flush agents pass skip_memory=True so system prompts don't get ingested as user data.
Cross-cutting Architectural Themes

Patterns that show up everywhere

Graceful Degradation

Config loading never crashes. Credential pool rotates on failure. Compression fires automatically. Windows gets no-op stubs for Unix-only features (fcntlmsvcrt fallback). Plugin hooks are fault-tolerant: each callback is wrapped in try/except so a misbehaving plugin cannot break the core loop.

Defense in Depth

Context files are scanned for prompt injection before ingestion. Memory entries are scanned for exfiltration patterns (curl, SSH backdoors, invisible unicode). File permissions are restrictive (0600 for secrets, 0700 for directories). The tirith binary does pre-execution security scanning. Multiple layers, each catching different threat vectors.

Ephemeral vs Persistent Separation

System prompts and prefill messages are injected at API call time but never persisted to the database. ephemeral_system_prompt is used during execution but excluded from RL training trajectories. This keeps stored conversations clean, reproducible, and free of runtime-specific scaffolding.

Profile Isolation

get_hermes_home() is the SOLE accessor for the Hermes directory — Path.home() / ".hermes" is never used directly. Profile switching sets HERMES_HOME before any module imports, so all 119+ references resolve to the active profile. Profile operations themselves are HOME-anchored (not HERMES_HOME), so hermes -p coder profile list can see all profiles.

Extension Surface Map

Where the pluggable boundaries are

A visual summary of every extension point and what it takes to use each one.

Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit

Loading...
Start with skills. If you're unsure whether you need a tool, plugin, or skill — start with a skill. Skills are Markdown files with instructions. They require zero code, zero restart, and can be iterated in seconds. Promote to a tool or plugin only when you hit a capability wall (API keys, binary data, lifecycle hooks).