agentic-sandbox

agent
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Fail
  • rm -rf — Recursive force deletion command in .gitea/workflows/ci.yaml
  • rm -rf — Recursive force deletion command in .gitea/workflows/conformance.yml
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Self-hostable runtime for persistent autonomous coding agents — KVM-isolated VMs (or rootless containers), A2A-protocol executor with signed AgentCard discovery, AIWG mission dispatch, web dashboard, virtiofs shared storage. Runs on your hardware; no hosted control plane.

README.md

Agentic Sandbox

Self-hostable runtime for persistent autonomous coding agents.

KVM-isolated VMs (or rootless containers) for long-running agent sessions. Management server with gRPC, WebSocket, and HTTP interfaces. Web dashboard, CLI, and REST API. Runs on your hardware; no hosted control plane.

git clone https://github.com/jmagly/agentic-sandbox.git
cd agentic-sandbox && make build && cd management && ./dev.sh
# open http://localhost:8122 → "+ Create Instance" → Container → Create → done

New here? Walk through Getting Started — prerequisite check, ~15 min to first running agent.

License: MIT
Rust
Platforms
gRPC

Features · Quick Start · Architecture · API


Features

  • Persistent sessions. Each agent runs inside its own VM (or container) with a persistent gRPC link to the management server. Closing your terminal does not stop the agent.
  • Hardware isolation. Full KVM virtualization — each agent gets its own kernel. Rootless Docker is supported as a lighter-weight alternative.
  • Shared storage with explicit namespaces. virtiofs-backed global (read-only) and inbox (read-write per-agent) mounts.
  • Live terminal observability. Server streams every PTY chunk to the dashboard; server-side virtual terminal snapshots available via REST.
  • Human-in-the-loop. PTY heuristics detect (y/n) and similar pauses, file a HITL request, and inject your response back into stdin.
  • Restart-safe. Session reconciliation, crash-loop detection, and ephemeral per-VM secrets.
  • Resource governance. Declarative quotas and per-VM CPU/memory/disk limits.

Part of the AIWG Suite

Agentic Sandbox is the runtime substrate for the AIWG SDLC suite. AIWG provides the agents, skills, and workflow scaffolding; Agentic Sandbox provides the isolated execution environment. Either can be used independently.


Quick Start

Full walkthrough — including prerequisite verification, build-time expectations, and troubleshooting — is in docs/getting-started.md. The summary below assumes the prerequisites are already installed.

Prerequisites: Linux host. For the container path (fastest): Rust 1.75+, protoc, Docker. For the VM path (full isolation): all of the above plus KVM (egrep -c '(vmx|svm)' /proc/cpuinfo > 0), libvirt + QEMU (apt install qemu-kvm libvirt-daemon-system), and an Ubuntu 24.04 base image (cd images/qemu && ./build-base-image.sh 24.04).

The recommended path launches the full system — management server + dashboard. From the dashboard you can create VM or container instances, attach terminal panes, and watch live events without ever touching a shell. Power-user shortcuts for skipping the dashboard are below.

Start the full system (recommended)

# 1. Build all three crates (management server, agent client, CLI)
make build      # or: ( cd management && cargo build --release ) && \
                #     ( cd agent-rs   && cargo build --release ) && \
                #     ( cd cli        && cargo build --release )

# 2. Start the management server. Dashboard is at http://localhost:8122,
#    WebSocket at ws://localhost:8121, gRPC at :8120.
cd management && ./dev.sh

# 3. Open the dashboard in a browser:
#    http://localhost:8122

In the dashboard:

  1. Click + Create Instance in the sidebar header.
  2. Pick Runtime:
    • Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (agentic/claude:latest, codex, opencode).
    • VM — full hardware isolation, ~30s–10m to provision depending on loadout. Pick a loadout (claude-only, full-suite, dual-review, etc.).
  3. Name it (agent-01, my-codex, anything matching [a-z0-9-]+).
  4. Click Create. The instance appears in the sidebar with a [VM] or [CT] badge.
  5. Click the row → click 📺 Pane to attach a terminal session.

Stop / Restart / Force off / Delete are all per-row buttons; the pane has a ⟳ Resync button if the terminal ever drifts.

Same flow from the CLI

If you'd rather not open a browser, the sandboxctl CLI (also installed as agentic-sandbox) does everything the dashboard does:

# After `make build`, install or symlink the binary:
ln -sf "$(pwd)/cli/target/release/sandboxctl" ~/.local/bin/

# Configure a context pointing at the local management server (one-time)
sandboxctl config set-context local --server http://localhost:8122

# Spawn a container-runtime agent
sandboxctl container create agent-01 --image agentic/claude:latest

# Or a VM-runtime agent
sandboxctl vm create agent-02 --loadout profiles/claude-only.yaml --agentshare --start

# List instances
sandboxctl agent list

# Find a session on the agent, then attach (Ctrl-A d to detach)
sandboxctl session list --agent agent-01
sandboxctl session attach <session-id> --write

# Submit a long-running task from a manifest file
cat > task.yaml <<'EOF'
prompt: "Refactor the authentication module to use JWT refresh tokens"
repository: "https://github.com/myorg/myapp"
model: "claude-opus-4-6"
timeout_seconds: 7200
EOF
sandboxctl task submit --file task.yaml --wait

Run sandboxctl --help for the full noun-first verb tree (agent / session / container / vm / task / hitl / loadout / storage / event / health / ops).

Advanced: skip the dashboard, provision a VM directly

For air-gapped boxes, scripted environments, or when you want a single VM without running the management server, drive the provisioner directly:

./images/qemu/provision-vm.sh agent-01 \
  --loadout profiles/claude-only.yaml \
  --agentshare \
  --start

# The agent inside the VM will try to dial host.internal:8120 in a loop.
# Start the management server first if you want gRPC + the dashboard;
# otherwise the VM is still SSH-reachable as a plain isolated environment:
ssh -i /var/lib/agentic-sandbox/secrets/ssh-keys/agent-01 agent@<vm-ip>

Useful flags: --profile basic (minimal cloud-init), --cpus 8 --memory 16G --disk 100G, --network-mode isolated|allowlist|full. See images/qemu/README.md for the full reference.

Submit a task via REST

If you're scripting against the API directly:

curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Refactor the authentication module to use JWT refresh tokens",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 7200
  }'

For the full provisioning, profile, and loadout reference, see docs/LOADOUTS.md and the Provisioning section below.


Architecture

Topology

Host
├── agent-01 (KVM VM)   192.168.122.201
│   ├── Claude Code
│   ├── Rust toolchain
│   └── agent-client → gRPC → Management Server
├── agent-02 (KVM VM)   192.168.122.202
│   └── agent-client → gRPC → Management Server
└── Management Server   :8120 gRPC  :8121 WS  :8122 HTTP

Each agent runs in a QEMU/KVM virtual machine provisioned from a cloud-init manifest. VMs are first-class objects with independent CPU, memory, and disk quotas, isolated libvirt networking, and ephemeral per-VM secrets. Docker containers are supported as a lighter-weight alternative for faster iteration.

Management Server

A Rust async server (Tokio, Tonic, Axum) that coordinates all connected agents:

┌─────────────────────────────────────────────────────────────┐
│                  Management Server (Rust)                    │
│                                                              │
│  gRPC :8120          WebSocket :8121        HTTP :8122       │
│  ┌──────────────┐    ┌───────────────┐    ┌──────────────┐  │
│  │ AgentService │    │ WebSocketHub  │    │ HTTP API     │  │
│  │ Connect()    │    │ terminal I/O  │    │ dashboard    │  │
│  │ Exec()       │    │ metrics push  │    │ REST CRUD    │  │
│  └──────────────┘    └───────────────┘    └──────────────┘  │
│                                                              │
│  AgentRegistry  CommandDispatcher  OutputAggregator          │
│  HitlStore      ScreenRegistry     CrashLoopDetector         │
│  TaskOrchestrator                  AiwgServeHandle           │
└─────────────────────────────────────────────────────────────┘

Agent state — heartbeats, metrics, setup progress, loadout metadata — is tracked in-memory via DashMap and exposed through all three interfaces.

Task Orchestrator

Submit long-running AI tasks that get assigned to available VMs, monitored through completion, and stream their logs via SSE:

PENDING → STAGING → PROVISIONING → READY → RUNNING → COMPLETING → COMPLETED
                                                  ↘                ↘
                                               FAILED           CANCELLED

Tasks receive a dedicated workspace in agentshare:

/srv/agentshare/
├── tasks/{task_id}/manifest.yaml   # Task metadata
├── inbox/{task_id}/                # Input files (read-only inside VM)
└── outbox/{task_id}/               # Artifacts written by agent

Agentshare Storage

VMs get virtiofs-mounted shared storage with separate read-only and read-write namespaces:

Mount VM Path Mode Purpose
Global /mnt/global (~/global) Read-only Shared tools, prompts, configs
Inbox /mnt/inbox (~/inbox) Read-write Task inputs, run logs, outputs

The inbox layout provides structured access patterns — agents find their task workspace at ~/inbox/current/ without needing to know task IDs.

Human-in-the-Loop (HITL)

The management server monitors PTY output and automatically detects when an agent is waiting for human input. Detection runs after every output chunk through a scored heuristic that recognizes patterns like (y/n), [Y/n], Human:, , and explicit confirmation phrases.

Agent PTY output
      │
      ▼
prompt_detector::detect_prompt()   ← scores output chunk
      │
  score ≥ 0.85
      │
      ▼
HitlStore::create()                ← deduplicates per session
      │
      ├── REST: GET /api/v1/hitl          (operator polls)
      ├── Dashboard: pending requests UI
      └── AiwgServeHandle::emit()         (if aiwg serve wired in)
                    │
              operator responds
                    │
                    ▼
POST /api/v1/hitl/{id}/respond     ← injects text into PTY stdin

One pending request per session at a time — duplicate detections are suppressed until the active request is resolved.

aiwg Serve Integration

When AIWG_SERVE_ENDPOINT is set, the management server registers with an aiwg serve dashboard and streams live sandbox events over a persistent authenticated WebSocket. The integration reconnects with exponential backoff (1 s → 30 s) and never blocks server startup.

The sandbox additionally registers as an AIWG executor (per executor.v1.md), accepting mission dispatches via POST /api/v1/sessions/:id/dispatch and reporting the full mission.* lifecycle (assigned → started → completed/failed/aborted, with HITL and resumability) over a second WS at /ws/executors/{id}. Mission state persists across mgmt-server restarts in <secrets_dir>/../missions.json. Full integration spec: docs/aiwg-executor.md.

Event Trigger
agent.connected gRPC stream registered
agent.disconnected gRPC stream closed or timed out
agent.ready cloud-init provisioning complete
agent.provisioning loadout step progress
session.start / session.end PTY/exec session lifecycle
hitl.input_required HITL prompt detected

A Real Walkthrough

What a typical autonomous coding task looks like end to end.

Provision

./images/qemu/provision-vm.sh agent-01 \
  --loadout profiles/claude-only.yaml \
  --agentshare \
  --start

VM boots, cloud-init runs the loadout manifest, agent-client registers via gRPC, status transitions Starting → Provisioning → Ready. If aiwg serve is configured, agent.ready fires.

Submit a Task

curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Refactor the authentication module to use JWT refresh tokens",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 7200
  }'

Task is assigned to agent-01, repository cloned into inbox, Claude Code launched inside the VM.

Monitor in Real Time

Open http://localhost:8122 for the live terminal stream, or:

curl http://localhost:8122/api/v1/tasks/{task_id}/logs

Agent Pauses — HITL

An hour in, Claude Code hits an ambiguous refactor decision and prints a confirmation prompt. The dashboard shows a pending HITL request. Respond without opening a terminal:

curl -X POST http://localhost:8122/api/v1/hitl/{hitl_id}/respond \
  -H "Content-Type: application/json" \
  -d '{"response": "yes, update all callers"}'

The response text is injected into the agent's PTY stdin and the agent continues.

Collect Artifacts

ls /srv/agentshare/outbox/{task_id}/
# auth-module/  jwt-refresh.ts  test-results.json  SUMMARY.md

Provisioning

Profiles

Pre-built profiles for common setups:

Profile Tools Use Case
agentic-dev Python (uv), Node.js (fnm), Go, Rust, Claude Code, Aider, Docker, ripgrep, fd, jq Full development environment
basic SSH, basic utilities Minimal — custom setup via cloud-init
./images/qemu/provision-vm.sh my-agent \
  --profile agentic-dev \
  --cpus 8 \
  --memory 16384 \
  --disk 100G \
  --agentshare \
  --start

Loadout Manifests

Declarative YAML manifests for composable provisioning. Loadouts specify tools, runtimes, AI providers, and AIWG frameworks without modifying base profiles:

# profiles/claude-only.yaml
name: claude-only
tools:
  - claude-code
  - ripgrep
  - fd
  - jq
runtimes:
  - python-uv
  - nodejs-fnm
aiwg_frameworks:
  - name: sdlc-complete
    providers: [claude]

See docs/LOADOUTS.md for the full manifest schema and available options.


Task Orchestration

Submit tasks to agents via the REST API. The orchestrator assigns tasks to available VMs, manages the workspace, and tracks lifecycle state.

# Submit a task
curl -X POST http://localhost:8122/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Audit the API for SQL injection vulnerabilities",
    "repository": "https://github.com/myorg/myapp",
    "model": "claude-opus-4-6",
    "timeout_seconds": 3600
  }'

# Check status
curl http://localhost:8122/api/v1/tasks/{task_id}

# Stream logs (SSE)
curl http://localhost:8122/api/v1/tasks/{task_id}/logs

# List artifacts
curl http://localhost:8122/api/v1/tasks/{task_id}/artifacts

# List A2A task artifacts captured by messages:send
curl http://localhost:8122/agents/{instance_id}/v1/tasks/{task_id}/artifacts

See docs/task-orchestration-api.md for full API details and docs/task-run-lifecycle.md for the lifecycle state machine.


Human-in-the-Loop (HITL)

The server monitors agent PTY output and automatically detects when an agent is waiting for human input. When detected, a HITL request is created and held until resolved.

# List pending requests
curl http://localhost:8122/api/v1/hitl

# Respond — text is injected directly into the agent's PTY stdin
curl -X POST http://localhost:8122/api/v1/hitl/a3f1b2.../respond \
  -H "Content-Type: application/json" \
  -d '{"response": "y"}'

Requests are deduplicated per session — a second prompt won't fire while the first is pending. Once resolved, the slot opens again.


VM Lifecycle

# Provision and start
./images/qemu/provision-vm.sh agent-01 --profile agentic-dev --agentshare --start

# Lifecycle management
virsh start agent-01          # start stopped VM
virsh shutdown agent-01       # graceful stop
virsh destroy agent-01        # force stop

# Rebuild (preserves IP and config)
./scripts/reprovision-vm.sh agent-01 --profile agentic-dev

# Remove completely
./scripts/destroy-vm.sh agent-01

# Deploy updated agent binary to running VM
./scripts/deploy-agent.sh agent-01 --debug

See docs/vm-lifecycle.md for the state machine and docs/LIFECYCLE.md for the full operations reference.


API Reference

Agents

Endpoint Method Description
/api/v1/agents GET List registered agents with metrics and loadout info
/api/v1/agents/{id} GET Get agent details
/api/v1/agents/{id} DELETE Remove agent
/api/v1/agents/{id}/start POST Start agent VM
/api/v1/agents/{id}/stop POST Stop agent VM
/api/v1/agents/{id}/destroy POST Force destroy agent VM
/api/v1/agents/{id}/reprovision POST Reprovision agent VM

Tasks

Endpoint Method Description
/api/v1/tasks GET List tasks
/api/v1/tasks POST Submit new task
/api/v1/tasks/{id} GET Get task status and metadata
/api/v1/tasks/{id} DELETE Cancel task
/api/v1/tasks/{id}/logs GET Stream task logs (SSE)
/api/v1/tasks/{id}/artifacts GET List task artifacts
/agents/{instance_id}/v1/tasks/{task_id}/artifacts GET List persisted A2A task artifacts
/agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id} GET Return one persisted A2A task artifact

VMs

Endpoint Method Description
/api/v1/vms GET List all VMs
/api/v1/vms POST Create VM
/api/v1/vms/{name} GET Get VM details
/api/v1/vms/{name}/start POST Start VM
/api/v1/vms/{name}/stop POST Graceful stop
/api/v1/vms/{name}/destroy POST Force stop
/api/v1/vms/{name} DELETE Delete VM

Human-in-the-Loop

Endpoint Method Description
/api/v1/hitl GET List pending HITL requests
/api/v1/agents/{id}/hitl POST Create HITL request for agent (returns 409 on duplicate)
/api/v1/hitl/{id}/respond POST Submit response — injects text into PTY stdin

Screen Observer

Endpoint Method Description
/api/v1/sessions/{id}/screen GET Current PTY screen snapshot (no WebSocket needed)
/ws/sessions/{id}/orchestrate WS Live screen updates; defaults to observer/read-only. Add ?role=controller to allow write/resize/signal frames.

System

Endpoint Method Description
/api/v1/secrets GET / POST / DELETE Manage agent authentication secrets
/api/v1/events GET VM lifecycle event stream (SSE)
/healthz GET Liveness probe
/readyz GET Readiness probe
/metrics GET Prometheus metrics

gRPC (Port 8120)

service AgentService {
  rpc Connect(stream AgentMessage) returns (stream ManagementMessage);
  rpc Exec(ExecRequest) returns (stream ExecOutput);
}

WebSocket (Port 8121)

Real-time push of agent metrics, PTY output, session events, and task progress. Used by the dashboard and external monitoring clients.


Configuration

Management Server

Variable Default Description
LISTEN_ADDR 0.0.0.0:8120 gRPC listen address (WS = port+1, HTTP = port+2)
SECRETS_DIR .run/secrets Directory containing agent-hashes.json
RUST_LOG info Log level: trace, debug, info, warn, error
LOG_FORMAT pretty Log format: pretty, json, compact
HEARTBEAT_TIMEOUT 90 Seconds before marking agent disconnected
METRICS_ENABLED true Enable Prometheus metrics export
AIWG_SERVE_ENDPOINT aiwg serve base URL (integration disabled if unset)
AIWG_SERVE_NAME agentic-sandbox Display name in aiwg serve dashboard

Agent Client

Variable Required Description
AGENT_ID Yes Unique identifier for this agent
AGENT_SECRET Yes 256-bit shared secret for authentication
MANAGEMENT_SERVER Yes Server address, e.g. 192.168.122.1:8120
HEARTBEAT_INTERVAL No Seconds between heartbeats (default: 30)

Override settings in management/.run/dev.env without modifying environment.


Monitoring

The management server exports Prometheus metrics at /metrics:

agentic_agents_connected         # Connected agent count
agentic_agents_ready             # Ready agents
agentic_tasks_running            # Active tasks
agentic_tasks_completed_total    # Total completed tasks
agentic_commands_total           # Commands dispatched
agentic_commands_duration_ms     # Command execution latency (histogram)

Set up Prometheus and AlertManager:

cd scripts/prometheus && ./deploy.sh
# Prometheus: http://localhost:9090
# AlertManager: http://localhost:9093

See docs/monitoring.md and docs/observability/ for alerting rules and dashboards.


Development

# Full cycle: rebuild server + agent, deploy to all running VMs
./scripts/dev-deploy-all.sh --debug

# Deploy agent binary to a specific VM
./scripts/deploy-agent.sh agent-01 --debug

# Management server live-reload
cd management && ./dev.sh

# E2E tests
./scripts/run-e2e-tests.sh

# Chaos tests
./scripts/chaos/run-all.sh

# Unit tests
cd management && cargo test
cd agent-rs && cargo test

Directory Structure

agentic-sandbox/
├── management/             # Management server (Rust)
│   ├── src/
│   │   ├── http/          # REST API handlers
│   │   ├── orchestrator/  # Task orchestration engine
│   │   ├── telemetry/     # Logging, metrics, tracing
│   │   ├── ws/            # WebSocket hub and connections
│   │   ├── hitl.rs        # HITL request store
│   │   ├── aiwg_serve.rs  # Outbound aiwg serve integration
│   │   ├── screen_state.rs # PTY screen observer
│   │   ├── prompt_detector.rs # HITL prompt heuristics
│   │   └── crash_loop.rs  # Crash loop detection
│   └── ui/                # Embedded web dashboard
├── agent-rs/              # Agent client (Rust)
├── cli/                   # CLI tool — VM management
├── proto/                 # gRPC protocol definitions
├── images/qemu/           # VM provisioning scripts and loadout profiles
├── scripts/               # Utility and deployment scripts
├── configs/               # Security profiles (seccomp)
├── docs/                  # Reference documentation
└── tests/e2e/             # End-to-end tests (pytest)

Documentation

Document Description
Architecture System design and component relationships
Positioning Design axes and when this is (or isn't) a good fit
API Reference Complete HTTP, gRPC, and WebSocket API
WebSocket Protocol Per-message reference: legacy agent-scoped + formal session-registry protocols
CLI Design sandboxctl operator/admin CLI taxonomy and acceptance criteria
Deployment Guide Installation and production configuration
Operations Guide Day-to-day operations and runbooks
Loadouts Declarative VM provisioning manifests
Agentshare Storage virtiofs storage layout and usage
Task Orchestration Task API and lifecycle
Task Run Lifecycle State machine and transitions
Session Reconciliation Session recovery after restarts
VM Lifecycle VM state machine and management
Troubleshooting Common issues and fixes
Monitoring Prometheus metrics and alerting
Observability Full observability setup
Reliability Reliability patterns and quickstart

Roadmap

  • QEMU/KVM provisioning with cloud-init
  • Management server (Rust/gRPC/WebSocket/HTTP)
  • Agent client with registration, heartbeat, and metrics
  • virtiofs shared storage (global/inbox)
  • Web dashboard with live terminal access
  • Task orchestration with artifact collection
  • Claude Code integration
  • sandboxctl operator/admin CLI (design)
  • Declarative loadout manifest system
  • Prometheus metrics and AlertManager alerting
  • Session reconciliation after server restart
  • VM pooling and resource quotas
  • PTY screen observer (server-side virtual terminal snapshots)
  • Human-in-the-Loop detection and REST API
  • aiwg serve outbound registration and event streaming
  • Crash loop detection and alerting
  • Docker runtime with rootless containers
  • Multi-host orchestration
  • Kubernetes operator

License

MIT — see LICENSE

Reviews (0)

No results found