Agentic Sandbox

Self-hostable runtime for persistent autonomous coding agents.

KVM-isolated VMs (or rootless containers) for long-running agent sessions. Management server with gRPC, WebSocket, and HTTP interfaces. Web dashboard, CLI, and REST API. Runs on your hardware; no hosted control plane.

git clone https://github.com/jmagly/agentic-sandbox.git
cd agentic-sandbox && make build && cd management && ./dev.sh
# open http://localhost:8122 → "+ Create Instance" → Container → Create → done

New here? Walk through Getting Started — prerequisite check, ~15 min to first running agent.

Features · Quick Start · Architecture · API

Features

Persistent sessions. Each agent runs inside its own VM (or container) with a persistent gRPC link to the management server. Closing your terminal does not stop the agent.
Hardware isolation. Full KVM virtualization — each agent gets its own kernel. Rootless Docker is supported as a lighter-weight alternative.
Shared storage with explicit namespaces. virtiofs-backed global (read-only) and inbox (read-write per-agent) mounts.
Live terminal observability. Server streams every PTY chunk to the dashboard; server-side virtual terminal snapshots available via REST.
Human-in-the-loop. PTY heuristics detect (y/n) and similar pauses, file a HITL request, and inject your response back into stdin.
Restart-safe. Session reconciliation, crash-loop detection, and ephemeral per-VM secrets.
Resource governance. Declarative quotas and per-VM CPU/memory/disk limits.

Part of the AIWG Suite

Agentic Sandbox is the runtime substrate for the AIWG SDLC suite. AIWG provides the agents, skills, and workflow scaffolding; Agentic Sandbox provides the isolated execution environment. Either can be used independently.

Quick Start

Full walkthrough — including prerequisite verification, build-time expectations, and troubleshooting — is in docs/getting-started.md. The summary below assumes the prerequisites are already installed.

Prerequisites: Linux host. For the container path (fastest): Rust 1.75+, protoc, Docker. For the VM path (full isolation): all of the above plus KVM (egrep -c '(vmx|svm)' /proc/cpuinfo > 0), libvirt + QEMU (apt install qemu-kvm libvirt-daemon-system), and an Ubuntu 24.04 base image (cd images/qemu && ./build-base-image.sh 24.04).

The recommended path launches the full system — management server + dashboard. From the dashboard you can create VM or container instances, attach terminal panes, and watch live events without ever touching a shell. Power-user shortcuts for skipping the dashboard are below.

Start the full system (recommended)

# 1. Build all three crates (management server, agent client, CLI)
make build      # or: ( cd management && cargo build --release ) && \
                #     ( cd agent-rs   && cargo build --release ) && \
                #     ( cd cli        && cargo build --release )

# 2. Start the management server. Dashboard is at http://localhost:8122,
#    WebSocket at ws://localhost:8121, gRPC at :8120.
cd management && ./dev.sh

# 3. Open the dashboard in a browser:
#    http://localhost:8122

In the dashboard:

Click + Create Instance in the sidebar header.
Pick Runtime:
- Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (agentic/claude:latest, codex, opencode).
- VM — full hardware isolation, ~30s–10m to provision depending on loadout. Pick a loadout (claude-only, full-suite, dual-review, etc.).
Name it (agent-01, my-codex, anything matching [a-z0-9-]+).
Click Create. The instance appears in the sidebar with a [VM] or [CT] badge.
Click the row → click 📺 Pane to attach a terminal session.

Stop / Restart / Force off / Delete are all per-row buttons; the pane has a ⟳ Resync button if the terminal ever drifts.

Same flow from the CLI

If you'd rather not open a browser, the sandboxctl CLI (also installed as agentic-sandbox) does everything the dashboard does:

# After `make build`, install or symlink the binary:
ln -sf "$(pwd)/cli/target/release/sandboxctl" ~/.local/bin/

# Configure a context pointing at the local management server (one-time)
sandboxctl config set-context local --server http://localhost:8122

# Spawn a container-runtime agent
sandboxctl container create agent-01 --image agentic/claude:latest

# Or a VM-runtime agent
sandboxctl vm create agent-02 --loadout profiles/claude-only.yaml --agentshare --start

# List instances
sandboxctl agent list

# Find a session on the agent, then attach (Ctrl-A d to detach)
sandboxctl session list --agent agent-01
sandboxctl session attach <session-id> --write

# Submit a long-running task from a manifest file
cat > task.yaml <<'EOF'
prompt: "Refactor the authentication module to use JWT refresh tokens"
repository: "https://github.com/myorg/myapp"
model: "claude-opus-4-6"
timeout_seconds: 7200
EOF
sandboxctl task submit --file task.yaml --wait

Run sandboxctl --help for the full noun-first verb tree (agent / session / container / vm / task / hitl / loadout / storage / event / health / ops).

Advanced: skip the dashboard, provision a VM directly

For air-gapped boxes, scripted environments, or when you want a single VM without running the management server, drive the provisioner directly:

./images/qemu/provision-vm.sh agent-01 \
  --loadout profiles/claude-only.yaml \
  --agentshare \
  --start

# The agent inside the VM will try to dial host.internal:8120 in a loop.
# Start the management server first if you want gRPC + the dashboard;
# otherwise the VM is still SSH-reachable as a plain isolated environment:
ssh -i /var/lib/agentic-sandbox/secrets/ssh-keys/agent-01 agent@<vm-ip>

Useful flags: --profile basic (minimal cloud-init), --cpus 8 --memory 16G --disk 100G, --network-mode isolated|allowlist|full. See images/qemu/README.md for the full reference.

Submit a task via REST

If you're scripting against the API directly:

curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Refactor the authentication module to use JWT refresh tokens",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 7200
  }'

For the full provisioning, profile, and loadout reference, see docs/LOADOUTS.md and the Provisioning section below.

Architecture

Topology

Host
├── agent-01 (KVM VM)   192.168.122.201
│   ├── Claude Code
│   ├── Rust toolchain
│   └── agent-client → gRPC → Management Server
├── agent-02 (KVM VM)   192.168.122.202
│   └── agent-client → gRPC → Management Server
└── Management Server   :8120 gRPC  :8121 WS  :8122 HTTP

Each agent runs in a QEMU/KVM virtual machine provisioned from a cloud-init manifest. VMs are first-class objects with independent CPU, memory, and disk quotas, isolated libvirt networking, and ephemeral per-VM secrets. Docker containers are supported as a lighter-weight alternative for faster iteration.

Management Server

A Rust async server (Tokio, Tonic, Axum) that coordinates all connected agents:

┌─────────────────────────────────────────────────────────────┐
│                  Management Server (Rust)                    │
│                                                              │
│  gRPC :8120          WebSocket :8121        HTTP :8122       │
│  ┌──────────────┐    ┌───────────────┐    ┌──────────────┐  │
│  │ AgentService │    │ WebSocketHub  │    │ HTTP API     │  │
│  │ Connect()    │    │ terminal I/O  │    │ dashboard    │  │
│  │ Exec()       │    │ metrics push  │    │ REST CRUD    │  │
│  └──────────────┘    └───────────────┘    └──────────────┘  │
│                                                              │
│  AgentRegistry  CommandDispatcher  OutputAggregator          │
│  HitlStore      ScreenRegistry     CrashLoopDetector         │
│  TaskOrchestrator                  AiwgServeHandle           │
└─────────────────────────────────────────────────────────────┘

Agent state — heartbeats, metrics, setup progress, loadout metadata — is tracked in-memory via DashMap and exposed through all three interfaces.

Task Orchestrator

Submit long-running AI tasks that get assigned to available VMs, monitored through completion, and stream their logs via SSE:

PENDING → STAGING → PROVISIONING → READY → RUNNING → COMPLETING → COMPLETED
                                                  ↘                ↘
                                               FAILED           CANCELLED

Tasks receive a dedicated workspace in agentshare:

/srv/agentshare/
├── tasks/{task_id}/manifest.yaml   # Task metadata
├── inbox/{task_id}/                # Input files (read-only inside VM)
└── outbox/{task_id}/               # Artifacts written by agent

Agentshare Storage

VMs get virtiofs-mounted shared storage with separate read-only and read-write namespaces:

Mount	VM Path	Mode	Purpose
Global	`/mnt/global` (`~/global`)	Read-only	Shared tools, prompts, configs
Inbox	`/mnt/inbox` (`~/inbox`)	Read-write	Task inputs, run logs, outputs

The inbox layout provides structured access patterns — agents find their task workspace at ~/inbox/current/ without needing to know task IDs.

Human-in-the-Loop (HITL)

The management server monitors PTY output and automatically detects when an agent is waiting for human input. Detection runs after every output chunk through a scored heuristic that recognizes patterns like (y/n), [Y/n], Human:, ❯, and explicit confirmation phrases.

Agent PTY output
      │
      ▼
prompt_detector::detect_prompt()   ← scores output chunk
      │
  score ≥ 0.85
      │
      ▼
HitlStore::create()                ← deduplicates per session
      │
      ├── REST: GET /api/v1/hitl          (operator polls)
      ├── Dashboard: pending requests UI
      └── AiwgServeHandle::emit()         (if aiwg serve wired in)
                    │
              operator responds
                    │
                    ▼
POST /api/v1/hitl/{id}/respond     ← injects text into PTY stdin

One pending request per session at a time — duplicate detections are suppressed until the active request is resolved.

aiwg Serve Integration

When AIWG_SERVE_ENDPOINT is set, the management server registers with an aiwg serve dashboard and streams live sandbox events over a persistent authenticated WebSocket. The integration reconnects with exponential backoff (1 s → 30 s) and never blocks server startup.

The sandbox additionally registers as an AIWG executor (per executor.v1.md), accepting mission dispatches via POST /api/v1/sessions/:id/dispatch and reporting the full mission.* lifecycle (assigned → started → completed/failed/aborted, with HITL and resumability) over a second WS at /ws/executors/{id}. Mission state persists across mgmt-server restarts in <secrets_dir>/../missions.json. Full integration spec: docs/aiwg-executor.md.

Event	Trigger
`agent.connected`	gRPC stream registered
`agent.disconnected`	gRPC stream closed or timed out
`agent.ready`	cloud-init provisioning complete
`agent.provisioning`	loadout step progress
`session.start` / `session.end`	PTY/exec session lifecycle
`hitl.input_required`	HITL prompt detected

A Real Walkthrough

What a typical autonomous coding task looks like end to end.

Provision

./images/qemu/provision-vm.sh agent-01 \
  --loadout profiles/claude-only.yaml \
  --agentshare \
  --start

VM boots, cloud-init runs the loadout manifest, agent-client registers via gRPC, status transitions Starting → Provisioning → Ready. If aiwg serve is configured, agent.ready fires.

Submit a Task

curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Refactor the authentication module to use JWT refresh tokens",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 7200
  }'

Task is assigned to agent-01, repository cloned into inbox, Claude Code launched inside the VM.

Monitor in Real Time

Open http://localhost:8122 for the live terminal stream, or:

curl http://localhost:8122/api/v1/tasks/{task_id}/logs

Agent Pauses — HITL

An hour in, Claude Code hits an ambiguous refactor decision and prints a confirmation prompt. The dashboard shows a pending HITL request. Respond without opening a terminal:

curl -X POST http://localhost:8122/api/v1/hitl/{hitl_id}/respond \
  -H "Content-Type: application/json" \
  -d '{"response": "yes, update all callers"}'

The response text is injected into the agent's PTY stdin and the agent continues.

Collect Artifacts

ls /srv/agentshare/outbox/{task_id}/
# auth-module/  jwt-refresh.ts  test-results.json  SUMMARY.md

Provisioning

Profiles

Pre-built profiles for common setups:

Profile	Tools	Use Case
`agentic-dev`	Python (uv), Node.js (fnm), Go, Rust, Claude Code, Aider, Docker, ripgrep, fd, jq	Full development environment
`basic`	SSH, basic utilities	Minimal — custom setup via cloud-init

./images/qemu/provision-vm.sh my-agent \
  --profile agentic-dev \
  --cpus 8 \
  --memory 16384 \
  --disk 100G \
  --agentshare \
  --start

Loadout Manifests

Declarative YAML manifests for composable provisioning. Loadouts specify tools, runtimes, AI providers, and AIWG frameworks without modifying base profiles:

# profiles/claude-only.yaml
name: claude-only
tools:
  - claude-code
  - ripgrep
  - fd
  - jq
runtimes:
  - python-uv
  - nodejs-fnm
aiwg_frameworks:
  - name: sdlc-complete
    providers: [claude]

See docs/LOADOUTS.md for the full manifest schema and available options.

Task Orchestration

Submit tasks to agents via the REST API. The orchestrator assigns tasks to available VMs, manages the workspace, and tracks lifecycle state.

# Submit a task
curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Audit the API for SQL injection vulnerabilities",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 3600
  }'

# Check status
curl http://localhost:8122/api/v1/tasks/{task_id}

# Stream logs (SSE)
curl http://localhost:8122/api/v1/tasks/{task_id}/logs

# List artifacts
curl http://localhost:8122/api/v1/tasks/{task_id}/artifacts

# List A2A task artifacts captured by messages:send
curl http://localhost:8122/agents/{instance_id}/v1/tasks/{task_id}/artifacts

See docs/task-orchestration-api.md for full API details and docs/task-run-lifecycle.md for the lifecycle state machine.

Human-in-the-Loop (HITL)

The server monitors agent PTY output and automatically detects when an agent is waiting for human input. When detected, a HITL request is created and held until resolved.

# List pending requests
curl http://localhost:8122/api/v1/hitl

# Respond — text is injected directly into the agent's PTY stdin
curl -X POST http://localhost:8122/api/v1/hitl/a3f1b2.../respond \
  -H "Content-Type: application/json" \
  -d '{"response": "y"}'

Requests are deduplicated per session — a second prompt won't fire while the first is pending. Once resolved, the slot opens again.

VM Lifecycle

# Provision and start
./images/qemu/provision-vm.sh agent-01 --profile agentic-dev --agentshare --start

# Lifecycle management
virsh start agent-01          # start stopped VM
virsh shutdown agent-01       # graceful stop
virsh destroy agent-01        # force stop

# Rebuild (preserves IP and config)
./scripts/reprovision-vm.sh agent-01 --profile agentic-dev

# Remove completely
./scripts/destroy-vm.sh agent-01

# Deploy updated agent binary to running VM
./scripts/deploy-agent.sh agent-01 --debug

See docs/vm-lifecycle.md for the state machine and docs/LIFECYCLE.md for the full operations reference.

API Reference

Agents

Endpoint	Method	Description
`/api/v1/agents`	GET	List registered agents with metrics and loadout info
`/api/v1/agents/{id}`	GET	Get agent details
`/api/v1/agents/{id}`	DELETE	Remove agent
`/api/v1/agents/{id}/start`	POST	Start agent VM
`/api/v1/agents/{id}/stop`	POST	Stop agent VM
`/api/v1/agents/{id}/destroy`	POST	Force destroy agent VM
`/api/v1/agents/{id}/reprovision`	POST	Reprovision agent VM

Tasks

Endpoint	Method	Description
`/api/v1/tasks`	GET	List tasks
`/api/v1/tasks`	POST	Submit new task
`/api/v1/tasks/{id}`	GET	Get task status and metadata
`/api/v1/tasks/{id}`	DELETE	Cancel task
`/api/v1/tasks/{id}/logs`	GET	Stream task logs (SSE)
`/api/v1/tasks/{id}/artifacts`	GET	List task artifacts
`/agents/{instance_id}/v1/tasks/{task_id}/artifacts`	GET	List persisted A2A task artifacts
`/agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id}`	GET	Return one persisted A2A task artifact

VMs

Endpoint	Method	Description
`/api/v1/vms`	GET	List all VMs
`/api/v1/vms`	POST	Create VM
`/api/v1/vms/{name}`	GET	Get VM details
`/api/v1/vms/{name}/start`	POST	Start VM
`/api/v1/vms/{name}/stop`	POST	Graceful stop
`/api/v1/vms/{name}/destroy`	POST	Force stop
`/api/v1/vms/{name}`	DELETE	Delete VM

Human-in-the-Loop

Endpoint	Method	Description
`/api/v1/hitl`	GET	List pending HITL requests
`/api/v1/agents/{id}/hitl`	POST	Create HITL request for agent (returns 409 on duplicate)
`/api/v1/hitl/{id}/respond`	POST	Submit response — injects text into PTY stdin

Screen Observer

Endpoint	Method	Description
`/api/v1/sessions/{id}/screen`	GET	Current PTY screen snapshot (no WebSocket needed)
`/ws/sessions/{id}/orchestrate`	WS	Live screen updates; defaults to observer/read-only. Add `?role=controller` to allow write/resize/signal frames.

System

Endpoint	Method	Description
`/api/v1/secrets`	GET / POST / DELETE	Manage agent authentication secrets
`/api/v1/events`	GET	VM lifecycle event stream (SSE)
`/healthz`	GET	Liveness probe
`/readyz`	GET	Readiness probe
`/metrics`	GET	Prometheus metrics

gRPC (Port 8120)

service AgentService {
  rpc Connect(stream AgentMessage) returns (stream ManagementMessage);
  rpc Exec(ExecRequest) returns (stream ExecOutput);
}

WebSocket (Port 8121)

Real-time push of agent metrics, PTY output, session events, and task progress. Used by the dashboard and external monitoring clients.

Configuration

Management Server

Variable	Default	Description
`LISTEN_ADDR`	`0.0.0.0:8120`	gRPC listen address (WS = port+1, HTTP = port+2)
`SECRETS_DIR`	`.run/secrets`	Directory containing `agent-hashes.json`
`RUST_LOG`	`info`	Log level: `trace`, `debug`, `info`, `warn`, `error`
`LOG_FORMAT`	`pretty`	Log format: `pretty`, `json`, `compact`
`HEARTBEAT_TIMEOUT`	`90`	Seconds before marking agent disconnected
`METRICS_ENABLED`	`true`	Enable Prometheus metrics export
`AIWG_SERVE_ENDPOINT`	—	aiwg serve base URL (integration disabled if unset)
`AIWG_SERVE_NAME`	`agentic-sandbox`	Display name in aiwg serve dashboard

Agent Client

Variable	Required	Description
`AGENT_ID`	Yes	Unique identifier for this agent
`AGENT_SECRET`	Yes	256-bit shared secret for authentication
`MANAGEMENT_SERVER`	Yes	Server address, e.g. `192.168.122.1:8120`
`HEARTBEAT_INTERVAL`	No	Seconds between heartbeats (default: 30)

Override settings in management/.run/dev.env without modifying environment.

Monitoring

The management server exports Prometheus metrics at /metrics:

agentic_agents_connected         # Connected agent count
agentic_agents_ready             # Ready agents
agentic_tasks_running            # Active tasks
agentic_tasks_completed_total    # Total completed tasks
agentic_commands_total           # Commands dispatched
agentic_commands_duration_ms     # Command execution latency (histogram)

Set up Prometheus and AlertManager:

cd scripts/prometheus && ./deploy.sh
# Prometheus: http://localhost:9090
# AlertManager: http://localhost:9093

See docs/monitoring.md and docs/observability/ for alerting rules and dashboards.

Development

# Full cycle: rebuild server + agent, deploy to all running VMs
./scripts/dev-deploy-all.sh --debug

# Deploy agent binary to a specific VM
./scripts/deploy-agent.sh agent-01 --debug

# Management server live-reload
cd management && ./dev.sh

# E2E tests
./scripts/run-e2e-tests.sh

# Chaos tests
./scripts/chaos/run-all.sh

# Unit tests
cd management && cargo test
cd agent-rs && cargo test

Directory Structure

agentic-sandbox/
├── management/             # Management server (Rust)
│   ├── src/
│   │   ├── http/          # REST API handlers
│   │   ├── orchestrator/  # Task orchestration engine
│   │   ├── telemetry/     # Logging, metrics, tracing
│   │   ├── ws/            # WebSocket hub and connections
│   │   ├── hitl.rs        # HITL request store
│   │   ├── aiwg_serve.rs  # Outbound aiwg serve integration
│   │   ├── screen_state.rs # PTY screen observer
│   │   ├── prompt_detector.rs # HITL prompt heuristics
│   │   └── crash_loop.rs  # Crash loop detection
│   └── ui/                # Embedded web dashboard
├── agent-rs/              # Agent client (Rust)
├── cli/                   # CLI tool — VM management
├── proto/                 # gRPC protocol definitions
├── images/qemu/           # VM provisioning scripts and loadout profiles
├── scripts/               # Utility and deployment scripts
├── configs/               # Security profiles (seccomp)
├── docs/                  # Reference documentation
└── tests/e2e/             # End-to-end tests (pytest)

Documentation

Document	Description
Architecture	System design and component relationships
Positioning	Design axes and when this is (or isn't) a good fit
API Reference	Complete HTTP, gRPC, and WebSocket API
WebSocket Protocol	Per-message reference: legacy agent-scoped + formal session-registry protocols
CLI Design	`sandboxctl` operator/admin CLI taxonomy and acceptance criteria
Deployment Guide	Installation and production configuration
Operations Guide	Day-to-day operations and runbooks
Loadouts	Declarative VM provisioning manifests
Agentshare Storage	virtiofs storage layout and usage
Task Orchestration	Task API and lifecycle
Task Run Lifecycle	State machine and transitions
Session Reconciliation	Session recovery after restarts
VM Lifecycle	VM state machine and management
Troubleshooting	Common issues and fixes
Monitoring	Prometheus metrics and alerting
Observability	Full observability setup
Reliability	Reliability patterns and quickstart

Roadmap

QEMU/KVM provisioning with cloud-init
Management server (Rust/gRPC/WebSocket/HTTP)
Agent client with registration, heartbeat, and metrics
virtiofs shared storage (global/inbox)
Web dashboard with live terminal access
Task orchestration with artifact collection
Claude Code integration
sandboxctl operator/admin CLI (design)
Declarative loadout manifest system
Prometheus metrics and AlertManager alerting
Session reconciliation after server restart
VM pooling and resource quotas
PTY screen observer (server-side virtual terminal snapshots)
Human-in-the-Loop detection and REST API
aiwg serve outbound registration and event streaming
Crash loop detection and alerting
Docker runtime with rootless containers
Multi-host orchestration
Kubernetes operator

License

MIT — see LICENSE

Agentic Sandbox

Self-hostable runtime for persistent autonomous coding agents.

Features

Part of the AIWG Suite

Quick Start

Start the full system (recommended)

Same flow from the CLI

Advanced: skip the dashboard, provision a VM directly

Submit a task via REST

Architecture

Topology

Management Server

Task Orchestrator

Agentshare Storage

Human-in-the-Loop (HITL)

aiwg Serve Integration

A Real Walkthrough

Provision

Submit a Task

Monitor in Real Time

Agent Pauses — HITL

Collect Artifacts

Provisioning

Profiles

Loadout Manifests

Task Orchestration

Human-in-the-Loop (HITL)

VM Lifecycle

API Reference

Agents

Tasks

VMs

Human-in-the-Loop

Screen Observer

System

gRPC (Port 8120)

WebSocket (Port 8121)

Configuration

Management Server

Agent Client

Monitoring

Development

Directory Structure

Documentation

Roadmap

License

Yorumlar (0)