Molecule AI

Architecture

System architecture, components, infrastructure, and communication model for the Molecule AI platform.

Architecture

Molecule AI is an open-source operating system for AI agent organizations — run the entire stack yourself (self-hosted) or use the hosted SaaS. It orchestrates agent workspaces that form an organizational hierarchy: each workspace registers with a control plane, runs on its own dedicated machine, communicates over the A2A (Agent-to-Agent) protocol, and appears live on a drag-and-drop canvas. The core is provider-agnostic — runtimes, models, and even physical devices are pluggable — so the ecosystem grows without forking the platform.

System Overview

Molecule AI platform architecture: operator surfaces (Canvas, CLI, MCP server, channels, REST) reach a thin central Control Plane (org & member management, billing & credits, tenant provisioning over EC2/Cloudflare/secrets, LLM proxy, provider-registry SSOT) that provisions one isolated tenant per org. Each tenant runs its OWN control core — the workspace-server (provisioner, registry/discovery with CanCommunicate ACL, A2A proxy, WebSocket hub, scheduler, secrets, audit, event store, channels) backed by the tenant's own Postgres and Redis — above a hierarchy of workspaces, each one agent on its own dedicated machine communicating peer-to-peer over A2A governed by the org hierarchy. Runtimes (claude-code, langgraph, autogen, openclaw, hermes, codex, google-adk, external) and model providers (Anthropic, OpenAI, Google Vertex, OpenRouter) are pluggable integrations.

Three properties define the architecture:

  • Hard isolation by machine. Every workspace is one agent on its own dedicated machine (own OS, filesystem, secrets). Workspaces cannot read each other's environment — there is no shared disk or shared process space. A2A over the network is the only sanctioned channel, and it is gated by the org hierarchy (CanCommunicate).
  • Anything can be a runtime. Behind one BaseAdapter contract, a workspace can be any agent framework (claude-code, langgraph, autogen, openclaw, hermes, codex, google-adk), an external/BYO agent, or — on the roadmap — an intelligent device: smart glasses, watches, robots, home/building systems, vehicles. Any A2A/MCP-speaking endpoint joins the org as a governed workspace. Models are equally pluggable (Anthropic, OpenAI/-compatible, Google Vertex/Gemini, OpenRouter).
  • Deep, namespaced memory. A hierarchical memory architecture (HMA) gives each workspace a durable namespace and three scopes — LOCAL (private), TEAM (parent + siblings), GLOBAL (org-wide) — whose reach follows the same org tree as communication.

A text summary of the same flow:

Canvas (Next.js :3000) <--WebSocket--> Platform (Go :8080) <--HTTP--> Postgres + Redis
                                                                        |
                                        Workspace A <----A2A----> Workspace B
                                        (Python agents)
                                              | register/heartbeat |
                                              +------ Platform ----+

This is the per-tenant view: the Canvas provides the visual interface, the Platform (the tenant's workspace-server) is the tenant's control core, and Workspaces are isolated machines running AI agent runtimes. All inter-agent communication is mediated by the Platform via the A2A proxy, which enforces hierarchical access control. Above all tenants sits the central control plane (api.moleculesai.app), which provisions tenants and handles orgs and billing.


Four Main Components

Canvas

Stack: Next.js 15 + React Flow (@xyflow/react v12) + Zustand + Tailwind CSS

The Canvas is the browser-based visual workspace graph. It provides:

  • Drag-and-drop layout with persistent node positions (saved via PATCH /workspaces/:id)
  • Team nesting using recursive TeamMemberChip components (up to 3 levels deep)
  • Real-time status via WebSocket connection to the Platform
  • Chat interface with two sub-tabs: "My Chat" (user-to-agent) and "Agent Comms" (agent-to-agent A2A traffic)
  • Config editor with "Save & Restart" and "Save" (deferred restart) modes
  • Secrets management with auto-restart on POST/DELETE

State management:

ConcernMechanism
Initial loadHTTP fetch GET /workspaces into Zustand
Real-time updatesWebSocket events via applyEvent()
Position persistenceonNodeDragStop sends PATCH /workspaces/:id with {x, y}
Node nestingnestNode sets hidden: !!targetId; children render inside parent

Environment variables:

VariableDefaultPurpose
NEXT_PUBLIC_PLATFORM_URLhttp://localhost:8080Platform API base URL
NEXT_PUBLIC_WS_URLws://localhost:8080/wsWebSocket endpoint

Platform

Stack: Go / Gin (workspace-server)

The Platform is the per-tenant control core — one instance runs on each org's tenant (a *.moleculesai.app instance in SaaS, or your own host when self-hosted) and owns everything for that org's workspaces:

  • Workspace CRUD -- create, read, update, delete workspaces
  • Registry -- workspace registration, heartbeat tracking, agent card management
  • Discovery -- peer lookup, access control checks
  • WebSocket hub -- real-time event broadcasting to Canvas clients
  • Liveness monitoring -- three-layer container health detection
  • A2A proxy -- routes inter-agent messages with hierarchical access control
  • Provisioner -- workspace machine/container lifecycle with tier-based resource limits
  • Scheduler -- cron-based scheduled tasks per workspace
  • Channel adapters -- social integrations (Telegram, Slack, etc.)

It is not the central SaaS control plane. A separate central control plane (api.moleculesai.app, molecule-controlplane) handles org & member management, billing & credits, the LLM proxy, the provider registry, and tenant provisioning — it spins up each org's tenant, and that tenant then runs its own Platform instance above. The Platform's Postgres and Redis (below) are the tenant's own, not shared across orgs.

Key environment variables:

VariableDefaultPurpose
DATABASE_URL(required)Postgres connection string
REDIS_URL(required)Redis connection string
PORT8080Server listen port
PLATFORM_URLhttp://host.docker.internal:PORTURL passed to agent containers
SECRETS_ENCRYPTION_KEY(optional)AES-256 key, 32 bytes
CORS_ORIGINShttp://localhost:3000,http://localhost:3001Allowed CORS origins
RATE_LIMIT600Requests per minute
MOLECULE_ENV(optional)Set production to hide test endpoints
MOLECULE_ORG_ID(optional)SaaS tenant org gating
WORKSPACE_DIR(optional)Global fallback host path for /workspace bind-mount
AWARENESS_URL(optional)Injected into workspace containers for cross-session memory
ACTIVITY_RETENTION_DAYS7How long activity logs are kept
ACTIVITY_CLEANUP_INTERVAL_HOURS6Cleanup sweep interval

Workspace tier resource limits:

TierEnv (Memory)Env (CPU)Defaults
Standard (Tier 2)TIER2_MEMORY_MBTIER2_CPU_SHARES512 MB / 1 CPU
Privileged (Tier 3)TIER3_MEMORY_MBTIER3_CPU_SHARES2048 MB / 2 CPU
Full-host (Tier 4)TIER4_MEMORY_MBTIER4_CPU_SHARES4096 MB / 4 CPU

Workspace Runtime

Published as: molecule-ai-workspace-runtime on PyPI

The shared runtime provides the base agent infrastructure: A2A server, heartbeat loop, config loading, platform auth, plugin system, and built-in tools. Most AI framework adapters live in their own standalone repository.

RuntimeStandalone RepoKey Dependencies
LangGraphmolecule-ai-workspace-template-langgraphlangchain-anthropic, langgraph
Claude Codemolecule-ai-workspace-template-claude-codeclaude-agent-sdk, @anthropic-ai/claude-code
AutoGenmolecule-ai-workspace-template-autogenautogen
Hermesmolecule-ai-workspace-template-hermesopenai, anthropic, google-genai
CodexcodexOpenAI Codex CLI

openclaw and google-adk ship as platform-bundled runtime templates (Gemini 2.5 Pro on Vertex AI via keyless ADC for google-adk); external is a bring-your-own agent with no platform-managed container. The canonical runtime set is the control-plane provider registry.

Each adapter repo has its own Dockerfile that installs molecule-ai-workspace-runtime from PyPI plus adapter-specific dependencies. Templates are cloned at Docker build time into the platform image via manifest.json.

Framework Adapters (workspace-template)

Some workspace templates embed framework-specific adapters that extend molecule-ai-workspace-runtime with framework-level security controls. The smolagents adapter (workspace-template/adapters/smolagents/) ships two such controls:

Environment sanitization (make_safe_env) — child processes spawned by the smolagents adapter inherit a filtered copy of the host environment. The following are stripped before the subprocess starts:

  • Any key listed in SMOLAGENTS_ENV_DENYLIST (comma-separated; set by the operator)
  • Any key whose name ends in _API_KEY or _TOKEN

Set SMOLAGENTS_ENV_DENYLIST=VAR1,VAR2 in the workspace's secrets to extend the denylist.

Safe message delivery (safe_send_message) — outbound smolagents messages are:

  1. Prefixed with [smolagents] so the source is always attributable in logs and Canvas activity
  2. Truncated at 2 000 characters to prevent oversized payloads
  3. HTML-entity-escaped to block social-engineering injections embedded in agent output

These controls complement the platform-level secret redaction described in the API Reference.

molecli

Stack: Go / Bubbletea + Lipgloss

A terminal UI dashboard for real-time workspace monitoring, event log streaming, health overview, and delete/filter operations. Reads MOLECLI_URL (default http://localhost:8080) to locate the platform. Now published as a standalone repo at git.moleculesai.app/molecule-ai/molecule-cli.


Infrastructure Services

All services run via docker-compose.infra.yml, attached to the shared molecule-monorepo-net network. Start them with:

./infra/scripts/setup.sh    # Start Postgres, Redis, Langfuse, Temporal; run migrations

Postgres (port 5432)

Primary datastore for workspaces, events, activity logs, secrets, schedules, channels, and more. Also backs Langfuse and Temporal via separate databases.

Key tables:

TablePurpose
workspacesCore entity -- status, runtime, agent_card, heartbeat, current_task
canvas_layoutsPersisted x/y positions
structure_eventsAppend-only event log
activity_logsA2A communications, task updates, agent logs, errors
workspace_schedulesCron tasks with expression, timezone, prompt, run history
workspace_channelsSocial channel integrations with JSONB config
workspace_secrets / global_secretsEncrypted secrets storage
workspace_auth_tokensBearer tokens (auto-revoked on workspace delete)
agent_memoriesHMA-scoped agent memory
approvalsHuman-in-the-loop approval requests

Migration runner: On startup, the platform globs *.sql in the migrations directory, filters out .down.sql files, sorts alphabetically, and executes each. All .up.sql files must be idempotent (CREATE TABLE IF NOT EXISTS, ALTER TABLE ... IF NOT EXISTS).

JSONB gotcha: When inserting Go []byte (from json.Marshal) into Postgres JSONB columns, you must convert to string() first and use ::jsonb cast in SQL. The lib/pq driver treats []byte as bytea, not JSONB.

Redis (port 6379)

Used for pub/sub event broadcasting and heartbeat TTL tracking. Workspace heartbeat keys expire after 60 seconds -- expiry triggers the liveness monitor.

Langfuse (port 3001)

LLM trace viewer backed by ClickHouse. Provides observability into agent LLM calls, token usage, and latency.

Temporal (port 7233 gRPC, port 8233 Web UI)

Durable workflow engine for workspace-template/builtin_tools/temporal_workflow.py. Dev-only posture: the auto-setup image runs with no auth on 0.0.0.0:7233. Production deployments must gate access via mTLS or an API key / reverse proxy.


Communication Model

WebSocket Events Flow

1. Action occurs (register, heartbeat, config change, etc.)
2. broadcaster.RecordAndBroadcast()
   -> inserts into structure_events table
   -> publishes to Redis pub/sub
3. Redis subscriber relays to WebSocket hub
4. Hub broadcasts to:
   - Canvas clients (all events)
   - Workspace clients (filtered by CanCommunicate)

A2A Proxy

The A2A proxy (POST /workspaces/:id/a2a) routes agent-to-agent messages. The caller identifies itself via the X-Workspace-ID header and authenticates with Authorization: Bearer <token>.

Access Control Rules

Determined by CanCommunicate(callerID, targetID) in registry/access.go:

RelationshipAllowed
Same workspace (self-call)Yes
Siblings (same parent_id)Yes
Root-level siblings (both parent_id IS NULL)Yes
Parent to child / child to parentYes
System callers (webhook:*, system:*, test:*)Yes (bypass)
Canvas requests (no X-Workspace-ID)Yes (bypass)
Everything elseDenied

Import Cycle Prevention

The platform uses function injection to avoid Go import cycles between ws, registry, and events packages:

  • ws.NewHub(canCommunicate AccessChecker) -- Hub accepts registry.CanCommunicate as a function
  • registry.StartLivenessMonitor(ctx, onOffline OfflineHandler) -- Liveness accepts broadcaster callback
  • registry.StartHealthSweep(ctx, checker ContainerChecker, interval, onOffline) -- Health sweep accepts Docker checker interface
  • Wiring happens in platform/cmd/server/main.go -- init order: wh -> onWorkspaceOffline -> liveness/healthSweep -> router

Container Health Detection

Three independent layers detect dead containers (e.g., Docker Desktop crash):

Layer 1: Passive (Redis TTL)

Each workspace sends heartbeats that set a Redis key with a 60-second TTL. When the key expires, the liveness monitor detects the workspace as offline and triggers an auto-restart.

Layer 2: Proactive (Health Sweep)

registry.StartHealthSweep polls the Docker API every 15 seconds. Catches dead containers faster than waiting for Redis TTL expiry.

Layer 3: Reactive (A2A Proxy)

When the A2A proxy encounters a connection error to a workspace, it immediately checks provisioner.IsRunning(). If the container is dead, it marks the workspace offline and triggers a restart.

All three layers call onWorkspaceOffline, which broadcasts WORKSPACE_OFFLINE and initiates wh.RestartByID(). Redis cleanup uses the shared db.ClearWorkspaceKeys() function.


Workspace Lifecycle

provisioning --> online (on register)
     ^              |
     |         degraded (error_rate > 0.5)
     |              |
     |           online (recovered)
     |              |
     |          offline (Redis TTL expired / health sweep)
     |              |
     +--- auto-restart ---+
                    |
                 removed (deleted)

Any state --> paused (user pauses) --> provisioning (user resumes)

Paused workspaces skip health sweep, liveness monitor, and auto-restart.

Restart context: After any restart and successful re-registration, the platform sends a synthetic A2A message/send with metadata.kind=restart_context containing the restart timestamp, previous session info, and available env-var keys (keys only, never values). The sender uses the system:restart-context caller prefix to bypass CanCommunicate. If the workspace does not re-register within 30 seconds, the message is dropped.

Initial prompt: Agents can auto-execute a prompt on startup before any user interaction. Configure via initial_prompt (inline string) or initial_prompt_file (path relative to config dir) in config.yaml. A .initial_prompt_done marker file prevents re-execution on restart.

Idle loop: When idle_prompt is non-empty in config.yaml, the workspace self-sends it every idle_interval_seconds (default 600) while heartbeat.active_tasks == 0. The idle check is local (no LLM call) and the prompt only fires when the agent is genuinely idle.


Deployment Modes

Self-Hosted

Run the full stack on your own infrastructure using Docker Compose:

# Infrastructure only (Postgres, Redis, Langfuse, Temporal)
docker compose -f docker-compose.infra.yml up -d

# Full stack
docker compose up

SaaS

Hosted at moleculesai.app with per-tenant isolation. Each tenant gets a dedicated Fly Machine running the tenant image. The MOLECULE_ORG_ID env var gates API access -- every non-allowlisted request must carry a matching X-Molecule-Org-Id header or gets a 404. When unset, the guard is a passthrough so self-hosted and dev environments are unaffected.

Tenant Image

platform/Dockerfile.tenant bundles the Go platform + Canvas frontend + templates into a single container image, published to ghcr.io/molecule-ai/platform:latest and :sha-<short>.


Subdomain Architecture

SubdomainServicePurpose
moleculesai.appLanding pageMarketing site
app.moleculesai.appSaaS dashboardTenant management UI
api.moleculesai.appCentral control plane (molecule-controlplane)Orgs, members, billing/credits, tenant provisioning, LLM proxy, provider registry (/cp/*)
doc.moleculesai.appDocumentationThis documentation site
status.moleculesai.appStatus pageUptime and incident tracking
*.moleculesai.appTenant instancesPer-org isolated platform instances

Plugin System

Plugins extend workspace capabilities. Two categories exist:

Shared plugins (auto-loaded by every workspace):

  • molecule-dev -- codebase conventions + review-loop skill
  • superpowers -- verification, TDD, systematic debugging, writing plans
  • ecc -- general Claude Code guardrails
  • browser-automation -- Puppeteer/CDP web scraping and live canvas screenshots

Modular guardrails (opt-in per workspace):

  • Hook plugins (ambient enforcement): molecule-careful-bash, molecule-freeze-scope, molecule-audit-trail, molecule-session-context, molecule-prompt-watchdog
  • Skill plugins (on-demand): molecule-skill-code-review, molecule-skill-cross-vendor-review, molecule-skill-llm-judge, molecule-skill-update-docs, molecule-skill-cron-learnings
  • Workflow plugins (slash commands): molecule-workflow-triage, molecule-workflow-retro

Org-template plugin resolution: Per-workspace plugins: lists in org template org.yaml role overrides UNION with defaults.plugins (deduplicated, defaults first). To opt a specific default out for a given role, prefix the plugin name with ! or - (e.g. !browser-automation).

Plugin install safeguards:

ParameterDefaultPurpose
PLUGIN_INSTALL_BODY_MAX_BYTES65536 (64 KiB)Max request body size
PLUGIN_INSTALL_FETCH_TIMEOUT5mWhole fetch+copy deadline
PLUGIN_INSTALL_MAX_DIR_BYTES104857600 (100 MiB)Max staged-tree size

CI Pipeline

GitHub Actions runs on push to main and on pull requests:

JobWhat it does
platform-buildGo build, vet, go test -race with 25% coverage threshold
canvas-buildnpm build, vitest run (tests must exist and pass)
python-lintpytest with coverage for workspace-template
e2e-apiSpins up Postgres + Redis, runs 62 API tests against locally-built binary
shellcheckLints all E2E shell scripts
publish-platform-imageBuilds and pushes to ghcr.io/molecule-ai/platform (main only)

Standalone repos (plugins + templates) use reusable workflows from Molecule-AI/molecule-ci for schema validation, secrets scanning, and Docker build smoke tests.

On this page