Hermes Runtime & Multi-Provider Dispatch
Hermes is Molecule AI's built-in inference router. Route tasks to Anthropic, Gemini, or any OpenAI-compatible model through native dispatch paths — with correct multi-turn history on all three.
Hermes Runtime & Multi-Provider Dispatch
Hermes is Molecule AI's built-in inference router powering runtime: hermes workspaces. It supports three dispatch paths — a native Anthropic Messages API path, a native Gemini generateContent path, and an OpenAI-compatible shim for 13+ other providers — keyed automatically by which API secret is present on the workspace.
Phases 2a through 2e are fully merged to main:
- Phase 2a (PR #240) — native Anthropic dispatch
- Phase 2b (PR #255) — native Gemini dispatch with correct
role: "model"+partswire format - Phase 2c (PR #267) — correct multi-turn history preserved as turns (not flattened) on all three paths
- Phase 2d (PR #499) — stacked system messages (
system_blockskwarg) on Anthropic and Gemini paths - Phase 2e (PRs #644, #645) — native
tools=[]parameter +response_format=json_schemastructured output on Anthropic native path
Remaining roadmap: vision content blocks and streaming on native paths are scoped for a future release.
Dispatch table
Hermes selects an inference path based on which API key is set on the workspace. Keys are resolved in priority order:
HERMES_API_KEY→OPENROUTER_API_KEY→ANTHROPIC_API_KEY→GEMINI_API_KEY
The first key found wins. Don't set HERMES_API_KEY if you want native Anthropic or Gemini dispatch — it takes priority and routes through the OpenAI-compat shim.
| Key present | Dispatch path | Provider | Wire format |
|---|---|---|---|
ANTHROPIC_API_KEY | Native Anthropic | Anthropic | Messages API — {role, content} |
GEMINI_API_KEY | Native Gemini | generateContent — {role: "model", parts: [{text}]} | |
OPENROUTER_API_KEY / HERMES_API_KEY / other | OpenAI-compat shim | 13+ providers | OpenAI Chat Completions |
| None | Error | — | — |
Fail-loud semantics: if ANTHROPIC_API_KEY is set but the anthropic Python package is not installed in the workspace image, Hermes raises a RuntimeError immediately — before any inference attempt. Same for google-genai. Silent fallback to the compat shim would mask format errors; Hermes fails loudly instead.
Secrets
Set provider keys as global or workspace-level secrets:
# Native Anthropic dispatch
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-..."}'
# Native Gemini dispatch
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}'
# OpenAI-compat shim (OpenRouter, Groq, Mistral, etc.)
curl -X PUT http://localhost:8080/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"OPENROUTER_API_KEY","value":"sk-or-..."}'To force a specific workspace to use Gemini dispatch when a global ANTHROPIC_API_KEY is set, clear the key at the workspace level:
curl -X PUT http://localhost:8080/workspaces/$GEMINI_WS/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":""}'Quickstart
Native Anthropic dispatch
export MOLECULE_API=http://localhost:8080
# 1. Store your Anthropic key
curl -s -X PUT $MOLECULE_API/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"ANTHROPIC_API_KEY","value":"sk-ant-YOUR-KEY"}' | jq .
# 2. Create a Hermes workspace
ANTHROPIC_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
-H "Content-Type: application/json" \
-d '{
"name": "hermes-anthropic",
"role": "Inference worker — native Anthropic path",
"runtime": "hermes",
"model": "anthropic:claude-sonnet-4-5"
}' | jq -r '.id')
# 3. Wait for ready
until curl -s $MOLECULE_API/workspaces/$ANTHROPIC_WS \
| jq -r '.status' | grep -q ready; do sleep 5; done
# 4. Confirm dispatch path
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"probe-1","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"Which provider API are you calling to generate this response?"}]}}
}' | jq '.result.parts[0].text'
# Expected: confirms Anthropic Messages API — no OpenAI-compat translation layerNative Gemini dispatch
# 1. Store your Gemini key
curl -s -X PUT $MOLECULE_API/settings/secrets \
-H "Content-Type: application/json" \
-d '{"key":"GEMINI_API_KEY","value":"YOUR-GEMINI-KEY"}' | jq .
# 2. Create a Gemini workspace
GEMINI_WS=$(curl -s -X POST $MOLECULE_API/workspaces \
-H "Content-Type: application/json" \
-d '{
"name": "hermes-gemini",
"role": "Inference worker — native Gemini path",
"runtime": "hermes",
"model": "gemini:gemini-2.0-flash"
}' | jq -r '.id')
# 3. Wait for ready
until curl -s $MOLECULE_API/workspaces/$GEMINI_WS \
| jq -r '.status' | grep -q ready; do sleep 5; done
# 4. Confirm dispatch path
curl -s -X POST $MOLECULE_API/workspaces/$GEMINI_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"probe-2","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"Which provider API are you calling?"}]}}
}' | jq '.result.parts[0].text'
# Expected: confirms Google generateContent — role: "model" + parts[] wrapper used correctlyMulti-turn history (Phase 2c)
# Turn 1
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"turn-1","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"My name is Alice. Remember that."}]}}
}' | jq '.result.parts[0].text'
# Turn 2 — history is threaded as turns, not flattened into a single blob
curl -s -X POST $MOLECULE_API/workspaces/$ANTHROPIC_WS/a2a \
-H "Content-Type: application/json" \
-d '{
"jsonrpc":"2.0","id":"turn-2","method":"message/send",
"params":{"message":{"role":"user","parts":[{"kind":"text",
"text":"What is my name?"}]}}
}' | jq '.result.parts[0].text'
# Expected: "Alice" — role attribution is preserved across turnsBefore Phase 2c, multi-turn history was flattened into a single user blob. The model could often recover context from the text but lost clean role attribution, which caused failures on structured prompts. Phase 2c passes turns as turns: OpenAI and Anthropic use {role, content}; Gemini uses {role: "model", parts: [{text}]}.
Multi-provider teams
An orchestrator can fan tasks to Anthropic and Gemini workers simultaneously, each routed through its native path — no application-level provider switching required:
# Fan out — both workers fire via delegate_task_async
curl -s -X POST $MOLECULE_API/workspaces/$ORCH_ID/a2a \
-H "Content-Type: application/json" \
-d "{
\"jsonrpc\":\"2.0\",\"id\":\"fan-1\",\"method\":\"message/send\",
\"params\":{\"message\":{\"role\":\"user\",\"parts\":[{\"kind\":\"text\",
\"text\":\"delegate_task_async $ANTHROPIC_WS 'Draft release notes for v2.1' AND delegate_task_async $GEMINI_WS 'Summarise the last 30 days of support tickets'\"}]}}
}" | jq .Both workers receive correctly formatted messages through their native paths. No LiteLLM proxy layer. No format translation overhead on every request.
Advanced: stacked system messages
NousResearch Hermes 4 works best when persona, tool context, and reasoning policy are sent as separate {"role": "system"} entries rather than one concatenated string. HermesA2AExecutor supports this via the system_blocks kwarg (PR #499).
Usage
from workspace_template.executors.hermes_a2a_executor import HermesA2AExecutor
executor = HermesA2AExecutor(
system_blocks=[
"You are a senior security auditor. Be terse and precise.", # persona
"You have access to bash, file search, and grep tools.", # tools context
"Think step-by-step before concluding. Cite evidence.", # reasoning policy
]
)The executor emits each non-empty, non-None block as a separate {"role": "system"} message in the recommended order: persona → tools context → reasoning policy.
Behaviour
| Condition | Result |
|---|---|
system_blocks is set | Emits one {"role": "system"} per non-empty block; system_prompt is ignored |
Entry is None or "" | Silently skipped |
| All entries empty | Zero system messages emitted |
system_blocks not set (None) | Falls back to the legacy system_prompt path — fully backward-compatible |
Backward compatibility
Callers that pass a single system_prompt string are unaffected:
# Legacy path — still works, no changes required
executor = HermesA2AExecutor(
system_prompt="You are a security auditor. Think step-by-step."
)Only set system_blocks when you want fine-grained control over block ordering or need to inject tool manifests into a dedicated block.
Native tools parameter (Phase 2e — PR #644)
Hermes now passes tool definitions to the model via the native tools=[] API parameter instead of injecting them as text in the prompt. This applies to the Anthropic native dispatch path and produces structured tool call/result blocks that the Nous/Hermes-3 tool call format handles correctly.
executor = HermesA2AExecutor(
tools=[
{
"name": "bash",
"description": "Run a bash command and return stdout/stderr.",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "The shell command to run"}
},
"required": ["command"]
}
}
]
)The OpenAI-compat shim path also accepts tools=[] but continues to inject them as text-in-prompt for compatibility with OpenRouter-routed models that don't natively support tool calls.
Structured output — response_format (Phase 2e — PR #645)
response_format=json_schema is wired through to the Anthropic native dispatch path. Pass a JSON Schema definition to request strictly-typed JSON output from the model:
executor = HermesA2AExecutor(
response_format={
"type": "json_schema",
"json_schema": {
"name": "audit_finding",
"schema": {
"type": "object",
"properties": {
"severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
"description": {"type": "string"},
"remediation": {"type": "string"}
},
"required": ["severity", "description", "remediation"]
}
}
}
)The model's completion will always be valid JSON matching the schema. The Gemini native and OpenAI-compat shim paths do not yet support response_format — it is silently ignored on those paths.
Capability table
Shipped (Phases 2a–2e — all merged to main)
| Capability | OpenAI-compat shim | Anthropic native | Gemini native |
|---|---|---|---|
| Plain text, single-turn | ✅ | ✅ | ✅ |
| Multi-turn history | ⚠️ flattened into one user blob | ✅ role-attributed turns | ✅ role: "model" + parts wrapper |
| Correct Gemini wire format | ❌ wrong role, missing parts | — | ✅ |
| No compat-shim translation overhead | ❌ every request translated | ✅ | ✅ |
Stacked system messages (system_blocks) | ❌ | ✅ | ✅ |
Native tools=[] parameter | ⚠️ text-in-prompt injection | ✅ PR #644 | 📋 roadmap |
Structured output (response_format=json_schema) | ❌ | ✅ PR #645 | 📋 roadmap |
Roadmap (future release)
| Capability | Anthropic native | Gemini native |
|---|---|---|
| Vision content blocks | 📋 | 📋 |
| Streaming | 📋 | 📋 |
| Native tools on Gemini path | — | 📋 |
| Structured output on Gemini path | — | 📋 |
Troubleshooting
RuntimeError: anthropic is not installed
The anthropic Python package is missing from the workspace image. Add anthropic to requirements.txt in your custom image and rebuild, or use the standard molecule-ai-workspace-template-hermes image.
Gemini workspace getting Anthropic dispatch instead
A global ANTHROPIC_API_KEY is taking priority. Clear it at the workspace level:
curl -X PUT $MOLECULE_API/workspaces/$GEMINI_WS/secrets \
-d '{"key":"ANTHROPIC_API_KEY","value":""}'Multi-turn context lost between calls
Each workspace maintains its own history buffer. Ensure you are sending all turns of a conversation to the same workspace. A2A context_id scopes history within the workspace.
OpenAI-compat shim returns garbled Gemini output
If you are routing a Gemini model through a key that triggers the compat shim (e.g. OPENROUTER_API_KEY), you will see the old role/format translation issues. Switch to GEMINI_API_KEY for native dispatch.
See also
- Concepts — Workspaces
- API Reference — POST /workspaces
- Google ADK Runtime — Gemini-native alternative to Hermes for ADK-first workflows
- PR #240: Phase 2a — native Anthropic dispatch
- PR #255: Phase 2b — native Gemini dispatch
- PR #267: Phase 2c — multi-turn history on all paths
- Issue #513