Hermes Agent
Diátaxis Documentation Map — v0.14.0 — Nous Research
The C4 diagrams show what Hermes is built from and how parts connect. This page fills in the rest: why the architecture is shaped this way, how to extend it, the mental model a newcomer needs, and the reference surfaces for daily work. It follows the Diátaxis framework — four documentation quadrants, each serving a different human need.
Four kinds of documentation, four human needs
Diátaxis identifies four modes of documentation that shouldn't be mixed. Each has a different relationship to action vs. knowledge and learning vs. working. Hermes needs all four.
Think of Hermes as a switchboard operator
Before diving into modules and code, build this picture in your head — a message comes in, the Agent thinks and acts in a loop, and an answer goes back out:
That's the entire system. Everything else — the 30+ tools, 27 platform adapters, credential pool, context compressor, cron scheduler — is supporting infrastructure around this single loop.
Ring 1: Entry Points
CLI, TUI, Web Dashboard, Gateway, ACP, MCP Server. These accept messages from different worlds and normalize them. They are thin — as little logic as possible.
Ring 2: The Agent Core
AIAgent + its intelligence modules. This is where decisions are made:
which model to call, how to build the prompt, when to compress, when to stop.
A MemoryManager orchestrates pluggable memory providers alongside the built-in store.
Ring 3: The Tool Surface
30+ tools that the Agent can invoke. Terminal, files, web, browser, memory, delegation. Each is self-contained; the Agent doesn't know their internals.
How to extend Hermes
Hermes has six well-defined extension surfaces. For each one: what it is, when to use it, and what's involved.
When: You need the agent to do something that requires end-to-end API key integration, custom processing logic that must execute precisely, or binary data / streaming / real-time events.
Important: If it can be expressed as instructions + shell commands, make it a skill instead. The answer is almost always skill, not tool.
- Create
tools/your_tool.pywith schema + handler +registry.register() - That's it for discovery —
discover_builtin_tools()intools/registry.pyauto-imports everytools/*.pyat startup - Add the tool name to
toolsets.py(_HERMES_CORE_TOOLSfor default availability, or a newTOOLSETSentry for a niche group) - Provide a
check_fnfor runtime availability gating (env vars, optional deps)
When: You want Hermes to speak a new messaging platform (e.g., Zulip, Teams, a custom webhook format).
Interface: Subclass BasePlatformAdapter and implement three abstract methods.
Can live in gateway/platforms/ (built-in) or as a platform plugin
that self-registers via PlatformRegistry.
- Required:
connect()→ bool - Required:
disconnect()→ None - Required:
send(chat_id, content, ...)→ SendResult - Optional:
send_image,send_voice,send_document,edit_message - Incoming messages emit
MessageEvent(normalized format) - Plugin path:
plugins/platforms/<name>/withplugin.yaml(kind: platform)
When: You want the terminal tool to execute commands somewhere new (e.g., Kubernetes pods, AWS Lambda, a custom sandbox).
Interface: Subclass BaseEnvironment. Two methods.
- Required:
execute(command, cwd, timeout)→ dict with output + returncode - Required:
cleanup()→ release resources - Shared helpers:
_prepare_command(),_timeout_result() - Register via
TERMINAL_ENVenvironment variable
When: You want to plug in a new long-term memory backend (e.g., a vector database, a knowledge graph, a custom cloud service).
Interface: Subclass MemoryProvider ABC from
agent/memory_provider.py. Implement lifecycle hooks. Drop into
plugins/memory/<name>/.
- Provide
plugin.yaml+__init__.pywithregister() - Lifecycle hooks: session start/end, pre/post tool calls
on_pre_compress()→ inject insights into summarieson_delegation()→ notified of subagent taskssession_idon all hooks for concurrent scoping- Prefetch is cached once per tool loop (latency-safe)
- 8 shipped providers as reference implementations
- Drop folder in
~/.hermes/plugins/<name>/ - Provide
plugin.yamlmanifest +__init__.py - Implement
register(ctx: PluginContext) ctx.register_tool()— add custom toolsctx.register_hook()— pre/post tool/LLM callsctx.inject_message()— inject into conversation- Hooks:
pre_tool_call,post_tool_call,pre_llm_call,post_llm_call,on_session_start,on_session_end
- Create
~/.hermes/skills/<category>/<name>/SKILL.md - YAML frontmatter + Markdown instructions
- Can include
references/,templates/,scripts/ - No code needed — just instructions the agent follows
- Prefer skills over tools for anything expressible as instructions + shell commands
- Skills Hub available for community sharing
Where everything lives
Factual, austere, meant to be consulted. The structure of the documentation mirrors the structure of the code.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit
Configuration Reference (config.yaml)
Core Settings
model— default LLM model identifierprovider— LLM provider namebase_url— custom API endpointapi_key— API key (prefer .env)personality— SOUL.md persona filemax_iterations— agent loop budget (default: 90)reasoning_effort— xhigh/high/medium/low/none
Terminal Settings
terminal.backend— local/docker/ssh/modal/daytona/singularityterminal.timeout— command timeout secondsterminal.docker_image— container imageterminal.container_persistent— survive sessionsterminal.env_passthrough— env vars to forward
Compression
compression.enabled— auto-compress on/offcompression.threshold_percent— trigger at % of contextcompression.protect_last_n— messages to protectcompression.summary_model— override auxiliary model
Auxiliary Models
auxiliary.vision— image analysis modelauxiliary.web_extract— content extraction modelauxiliary.compression— summarization modelauxiliary.approval— command safety check model- Each has: model, provider, base_url, api_key, timeout
File System Layout (~/.hermes/)
Why the architecture is shaped this way
Every architectural choice has a trade-off. These are the ones that matter most for understanding Hermes — the decisions that would confuse you if you didn't know the reasoning.
Without explanation, the practitioner's knowledge of their craft is loose and fragmented, and their exercise of it is anxious.
— Diátaxis, on the value of explanation_PARALLEL_SAFE_TOOLS (read-only tools),
_NEVER_PARALLEL_TOOLS (interactive tools like clarify), and
_PATH_SCOPED_TOOLS (safe only if targeting different file paths). The key
trade-off: if ANY tool in a batch is unknown or has unparseable args, the entire batch
falls back to sequential. Safety over speed, always.
AIAgent takes ~10 callback parameters (stream_callback,
tool_progress, thinking, clarify, step,
status, …) which makes it completely platform-agnostic — the CLI
injects terminal spinners, the Gateway injects platform sending, ACP injects editor
responses. Same agent, different shells. The trade-off: AIAgent.__init__ has
50+ parameters — a “god class,” but cohesive around one conversation lifecycle.
registry.register() at module import time. The registry
(tools/registry.py) has zero imports from tool files — preventing circular
dependencies. Discovery happens when model_tools.py imports all tool modules.
This means adding a tool requires only creating the file, importing it, and declaring its toolset.
No central manifest to keep in sync. The trade-off: tool registration is a side effect of
importing, which is unusual in Python.
_is_destructive_command()) uses regex heuristics
for rm, mv, dd, git reset --hard.
Path overlap detection uses prefix comparison rather than inode resolution (because "the file
may not exist yet"). Context compression uses character limits, not token counts (because
"char counts are model-independent"). These are all conscious simplicity choices
that trade precision for reliability and portability.
delegate_task (prevents fork bombs),
clarify (would confuse the user — they didn't talk to this agent),
memory (prevents shared state corruption), send_message
(no cross-platform side effects), execute_code (children should reason
step-by-step). Max depth is 1 by default (configurable up to 3 via
delegation.max_spawn_depth). Max concurrent children is 3 by default.
The parent only sees
the delegation call + summary result, never intermediate tool calls —
information hiding for context efficiency.
fill_first, round_robin,
random, least_used), with fill_first as default
because it maximizes provider-side cache hits.
honcho_integration/ subsystem into a
pluggable provider interface — MemoryProvider ABC in
agent/memory_provider.py, orchestrated by agent/memory_manager.py.
Honcho became just one of 8 plugins in plugins/memory/, matching the same ABC-plus-plugins
pattern used for platform adapters and execution backends. Key design choices:
- Built-in memory (
MEMORY.md/USER.md) is always on alongside the external provider. prefetch_all()is cached once before the tool loop to avoid 10x latency spikes; sync happens non-blocking on threads.- New hooks
on_pre_compress()andon_delegation()let providers inject insights into compression summaries and subagent results. - Cron and gateway flush agents pass
skip_memory=Trueso system prompts don't get ingested as user data.
Patterns that show up everywhere
Config loading never crashes. Credential pool rotates on failure. Compression fires
automatically. Windows gets no-op stubs for Unix-only features (fcntl → msvcrt
fallback). Plugin hooks are fault-tolerant: each callback is wrapped in try/except so a
misbehaving plugin cannot break the core loop.
Context files are scanned for prompt injection before ingestion. Memory entries are scanned
for exfiltration patterns (curl, SSH backdoors, invisible unicode). File permissions are
restrictive (0600 for secrets, 0700 for directories). The tirith binary does
pre-execution security scanning. Multiple layers, each catching different threat vectors.
System prompts and prefill messages are injected at API call time but never persisted to the
database. ephemeral_system_prompt is used during execution but excluded from RL
training trajectories. This keeps stored conversations clean, reproducible, and free of
runtime-specific scaffolding.
get_hermes_home() is the SOLE accessor for the Hermes directory — Path.home() / ".hermes"
is never used directly. Profile switching sets HERMES_HOME before any module imports,
so all 119+ references resolve to the active profile. Profile operations themselves are
HOME-anchored (not HERMES_HOME), so hermes -p coder profile list can see all profiles.
Where the pluggable boundaries are
A visual summary of every extension point and what it takes to use each one.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Drag when zoomed · Double-click to fit