Molecule AI
ArchitectureTechnical Reference

Workspace Lifecycle & Provisioning

Workspace state machine, provisioning and container lifecycle, and the workspace runtime.

Part of the Comprehensive Technical Documentation. Definitive reference based on a non-invasive scan of the molecule-core repository.

5. Workspace Lifecycle

State Machine

provisioning → online ↔ degraded
   ↓             ↓         ↓
 failed       offline    offline
   ↓             ↓
 retry      (auto-restart)

↓ (any state)
paused → (user resumes) → provisioning

↓ (any state)
removed

Status Definitions

StatusMeaningCanvas Indicator
provisioningWaiting for first heartbeatSpinner
onlineHeartbeat received, reachableGreen dot
degradedOnline but error_rate ≥ 0.5Yellow node with warning
offlineHeartbeat TTL expired, unreachableGray node
pausedUser paused, container stopped, config preservedIndigo badge
failedProvisioning timeout or launch errorRed node + retry button
removedDeleted, kept for event log historyNode removed from Canvas

Health Detection (Three Layers)

LayerMechanismIntervalTrigger
PassiveRedis TTL expiry60s heartbeat keyLiveness monitor callback
ProactiveDocker API pollEvery 15sHealth sweep goroutine
ReactiveA2A proxy connection errorOn-demandprovisioner.IsRunning() check

All three layers call onWorkspaceOffline() → broadcast WORKSPACE_OFFLINE + auto-restart.

Cascade Behavior

  • Pause: Pausing a parent cascades to all children. Children of a paused parent cannot be individually resumed.
  • Delete: Removes container, cleans memory (DB rows, Redis keys). Structure events and Agent Card history are never deleted.

11. Provisioning & Container Lifecycle

Docker Networking

  • All containers join molecule-monorepo-net private network
  • Container naming: ws-{workspace_id[:12]}
  • Ephemeral host port binding: 127.0.0.1:0→8000/tcp

URL Resolution

CallerURL TypeExample
Workspace (container)Docker-internalhttp://ws-{id}:8000
Canvas (browser)Host-mappedhttp://127.0.0.1:{ephemeral_port}

Container Cleanup on Delete

  1. Docker container stopped and removed
  2. Memory cleaned (DB rows, Redis keys)
  3. Status set to removed
  4. WORKSPACE_REMOVED event written to structure_events
  5. Structure events and Agent Card history never deleted (audit trail)

12. Workspace Runtime

Entry Point: workspace/main.py

Startup Sequence (10 steps):

  1. Initialize telemetry (OpenTelemetry, no-op if packages absent)
  2. Load config.yaml into WorkspaceConfig dataclass
  3. Run preflight validation (model availability, skills, configs)
  4. Create HeartbeatLoop for background task tracking
  5. Resolve adapter from runtime field in config
  6. Run adapter setup() and create_executor()
  7. Build Agent Card from loaded skills + runtime capabilities
  8. Register: POST /registry/register with workspace ID + Agent Card
  9. Start heartbeat loop (30s interval) + skill hot-reload watcher
  10. Serve A2A over Uvicorn on configured port

Runtime Configuration Schema (config.yaml)

name: "Workspace Name"
description: ""
version: "1.0.0"
tier: 2                                    # 1=sandboxed, 2=standard, 3=privileged, 4=full-host
model: "anthropic:claude-sonnet-4-6"       # provider:model syntax
runtime: "langgraph"                       # claude-code | langgraph | autogen | openclaw | hermes | codex | google-adk | external
runtime_config:                            # Runtime-specific settings
  command: "claude"                        # For CLI runtimes
  args: []
  auth_token_file: ".auth-token"
  timeout: 0
  model: ""                                # Override model just for this runtime
skills: ["skill1", "skill2"]               # Folder names under skills/
tools: ["web_search", "filesystem"]        # Built-in tool names
prompt_files: ["system-prompt.md"]         # Additional prompt text files
shared_context: []                         # Files from parent workspace

a2a:
  port: 8000
  streaming: true
  push_notifications: true

delegation:
  retry_attempts: 3
  retry_delay: 5.0
  timeout: 120.0
  escalate: true

sandbox:
  backend: "subprocess"                    # subprocess | docker
  memory_limit: "256m"
  timeout: 30

rbac:
  roles: ["operator"]
  allowed_actions: {}

hitl:
  channels:
    - type: "dashboard"
  default_timeout: 300
  bypass_roles: []

governance:
  enabled: false
  policy_mode: "audit"                     # audit | permissive | strict
  policy_file: ""

security_scan:
  mode: "warn"                             # warn | block | off

compliance:
  mode: "owasp_agentic"
  prompt_injection: "detect"               # detect | block
  max_tool_calls_per_task: 50
  max_task_duration_seconds: 300

Seven Runtime Adapters

AdapterCore StrengthImage Tag
LangGraphGraph-based state machine, tool use, streamingworkspace-template:langgraph
Claude CodeNative coding workflows, CLI continuity, OAuth authworkspace-template:claude-code
AutoGenMulti-agent conversations, explicit strategiesworkspace-template:autogen
OpenClawCLI-native runtime, own session modelworkspace-template:openclaw
HermesStacked system messages, native tool calls, Kimiworkspace-template:hermes
CodexOpenAI Codex CLI, OAuth/API/platform armsworkspace-template:codex
Google ADKGemini 2.5 Pro on Vertex AI, keyless ADC/WIFworkspace-template:google-adk

Branch-level WIP: NemoClaw (NVIDIA T4 + Docker socket) on feat/nemoclaw-t4-docker.

Each adapter implements setup() + create_executor(). The base adapter provides shared infrastructure: system prompt assembly, skill loading, tool registration, coordinator detection, plugin injection.


On this page