agentic-sandbox
Health Uyari
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Basarisiz
- rm -rf — Recursive force deletion command in .gitea/workflows/ci.yaml
- rm -rf — Recursive force deletion command in .gitea/workflows/conformance.yml
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Self-hostable runtime for persistent autonomous coding agents — KVM-isolated VMs (or rootless containers), A2A-protocol executor with signed AgentCard discovery, AIWG mission dispatch, web dashboard, virtiofs shared storage. Runs on your hardware; no hosted control plane.
Agentic Sandbox
Self-hostable runtime for persistent autonomous coding agents.
KVM-isolated VMs (or rootless containers) for long-running agent sessions. Management server with gRPC, WebSocket, and HTTP interfaces. Web dashboard, CLI, and REST API. Runs on your hardware; no hosted control plane.
git clone https://github.com/jmagly/agentic-sandbox.git
cd agentic-sandbox && make build && cd management && ./dev.sh
# open http://localhost:8122 → "+ Create Instance" → Container → Create → done
New here? Walk through Getting Started — prerequisite check, ~15 min to first running agent.
Features · Quick Start · Architecture · API
Features
- Persistent sessions. Each agent runs inside its own VM (or container) with a persistent gRPC link to the management server. Closing your terminal does not stop the agent.
- Hardware isolation. Full KVM virtualization — each agent gets its own kernel. Rootless Docker is supported as a lighter-weight alternative.
- Shared storage with explicit namespaces. virtiofs-backed
global(read-only) andinbox(read-write per-agent) mounts. - Live terminal observability. Server streams every PTY chunk to the dashboard; server-side virtual terminal snapshots available via REST.
- Human-in-the-loop. PTY heuristics detect
(y/n)and similar pauses, file a HITL request, and inject your response back into stdin. - Restart-safe. Session reconciliation, crash-loop detection, and ephemeral per-VM secrets.
- Resource governance. Declarative quotas and per-VM CPU/memory/disk limits.
Part of the AIWG Suite
Agentic Sandbox is the runtime substrate for the AIWG SDLC suite. AIWG provides the agents, skills, and workflow scaffolding; Agentic Sandbox provides the isolated execution environment. Either can be used independently.
Quick Start
Full walkthrough — including prerequisite verification, build-time expectations, and troubleshooting — is in docs/getting-started.md. The summary below assumes the prerequisites are already installed.
Prerequisites: Linux host. For the container path (fastest): Rust 1.75+,
protoc, Docker. For the VM path (full isolation): all of the above plus KVM (egrep -c '(vmx|svm)' /proc/cpuinfo> 0), libvirt + QEMU (apt install qemu-kvm libvirt-daemon-system), and an Ubuntu 24.04 base image (cd images/qemu && ./build-base-image.sh 24.04).
The recommended path launches the full system — management server + dashboard. From the dashboard you can create VM or container instances, attach terminal panes, and watch live events without ever touching a shell. Power-user shortcuts for skipping the dashboard are below.
Start the full system (recommended)
# 1. Build all three crates (management server, agent client, CLI)
make build # or: ( cd management && cargo build --release ) && \
# ( cd agent-rs && cargo build --release ) && \
# ( cd cli && cargo build --release )
# 2. Start the management server. Dashboard is at http://localhost:8122,
# WebSocket at ws://localhost:8121, gRPC at :8120.
cd management && ./dev.sh
# 3. Open the dashboard in a browser:
# http://localhost:8122
In the dashboard:
- Click + Create Instance in the sidebar header.
- Pick Runtime:
- Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (
agentic/claude:latest,codex,opencode). - VM — full hardware isolation, ~30s–10m to provision depending on loadout. Pick a loadout (
claude-only,full-suite,dual-review, etc.).
- Container — fast (~2s), backed by Docker. Choose an agent image from the dropdown (
- Name it (
agent-01,my-codex, anything matching[a-z0-9-]+). - Click Create. The instance appears in the sidebar with a
[VM]or[CT]badge. - Click the row → click 📺 Pane to attach a terminal session.
Stop / Restart / Force off / Delete are all per-row buttons; the pane has a ⟳ Resync button if the terminal ever drifts.
Same flow from the CLI
If you'd rather not open a browser, the sandboxctl CLI (also installed as agentic-sandbox) does everything the dashboard does:
# After `make build`, install or symlink the binary:
ln -sf "$(pwd)/cli/target/release/sandboxctl" ~/.local/bin/
# Configure a context pointing at the local management server (one-time)
sandboxctl config set-context local --server http://localhost:8122
# Spawn a container-runtime agent
sandboxctl container create agent-01 --image agentic/claude:latest
# Or a VM-runtime agent
sandboxctl vm create agent-02 --loadout profiles/claude-only.yaml --agentshare --start
# List instances
sandboxctl agent list
# Find a session on the agent, then attach (Ctrl-A d to detach)
sandboxctl session list --agent agent-01
sandboxctl session attach <session-id> --write
# Submit a long-running task from a manifest file
cat > task.yaml <<'EOF'
prompt: "Refactor the authentication module to use JWT refresh tokens"
repository: "https://github.com/myorg/myapp"
model: "claude-opus-4-6"
timeout_seconds: 7200
EOF
sandboxctl task submit --file task.yaml --wait
Run sandboxctl --help for the full noun-first verb tree (agent / session / container / vm / task / hitl / loadout / storage / event / health / ops).
Advanced: skip the dashboard, provision a VM directly
For air-gapped boxes, scripted environments, or when you want a single VM without running the management server, drive the provisioner directly:
./images/qemu/provision-vm.sh agent-01 \
--loadout profiles/claude-only.yaml \
--agentshare \
--start
# The agent inside the VM will try to dial host.internal:8120 in a loop.
# Start the management server first if you want gRPC + the dashboard;
# otherwise the VM is still SSH-reachable as a plain isolated environment:
ssh -i /var/lib/agentic-sandbox/secrets/ssh-keys/agent-01 agent@<vm-ip>
Useful flags: --profile basic (minimal cloud-init), --cpus 8 --memory 16G --disk 100G, --network-mode isolated|allowlist|full. See images/qemu/README.md for the full reference.
Submit a task via REST
If you're scripting against the API directly:
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"prompt": "Refactor the authentication module to use JWT refresh tokens",
"repository": "https://github.com/myorg/myapp",
"model": "claude-opus-4-6",
"timeout_seconds": 7200
}'
For the full provisioning, profile, and loadout reference, see docs/LOADOUTS.md and the Provisioning section below.
Architecture
Topology
Host
├── agent-01 (KVM VM) 192.168.122.201
│ ├── Claude Code
│ ├── Rust toolchain
│ └── agent-client → gRPC → Management Server
├── agent-02 (KVM VM) 192.168.122.202
│ └── agent-client → gRPC → Management Server
└── Management Server :8120 gRPC :8121 WS :8122 HTTP
Each agent runs in a QEMU/KVM virtual machine provisioned from a cloud-init manifest. VMs are first-class objects with independent CPU, memory, and disk quotas, isolated libvirt networking, and ephemeral per-VM secrets. Docker containers are supported as a lighter-weight alternative for faster iteration.
Management Server
A Rust async server (Tokio, Tonic, Axum) that coordinates all connected agents:
┌─────────────────────────────────────────────────────────────┐
│ Management Server (Rust) │
│ │
│ gRPC :8120 WebSocket :8121 HTTP :8122 │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ AgentService │ │ WebSocketHub │ │ HTTP API │ │
│ │ Connect() │ │ terminal I/O │ │ dashboard │ │
│ │ Exec() │ │ metrics push │ │ REST CRUD │ │
│ └──────────────┘ └───────────────┘ └──────────────┘ │
│ │
│ AgentRegistry CommandDispatcher OutputAggregator │
│ HitlStore ScreenRegistry CrashLoopDetector │
│ TaskOrchestrator AiwgServeHandle │
└─────────────────────────────────────────────────────────────┘
Agent state — heartbeats, metrics, setup progress, loadout metadata — is tracked in-memory via DashMap and exposed through all three interfaces.
Task Orchestrator
Submit long-running AI tasks that get assigned to available VMs, monitored through completion, and stream their logs via SSE:
PENDING → STAGING → PROVISIONING → READY → RUNNING → COMPLETING → COMPLETED
↘ ↘
FAILED CANCELLED
Tasks receive a dedicated workspace in agentshare:
/srv/agentshare/
├── tasks/{task_id}/manifest.yaml # Task metadata
├── inbox/{task_id}/ # Input files (read-only inside VM)
└── outbox/{task_id}/ # Artifacts written by agent
Agentshare Storage
VMs get virtiofs-mounted shared storage with separate read-only and read-write namespaces:
| Mount | VM Path | Mode | Purpose |
|---|---|---|---|
| Global | /mnt/global (~/global) |
Read-only | Shared tools, prompts, configs |
| Inbox | /mnt/inbox (~/inbox) |
Read-write | Task inputs, run logs, outputs |
The inbox layout provides structured access patterns — agents find their task workspace at ~/inbox/current/ without needing to know task IDs.
Human-in-the-Loop (HITL)
The management server monitors PTY output and automatically detects when an agent is waiting for human input. Detection runs after every output chunk through a scored heuristic that recognizes patterns like (y/n), [Y/n], Human:, ❯, and explicit confirmation phrases.
Agent PTY output
│
▼
prompt_detector::detect_prompt() ← scores output chunk
│
score ≥ 0.85
│
▼
HitlStore::create() ← deduplicates per session
│
├── REST: GET /api/v1/hitl (operator polls)
├── Dashboard: pending requests UI
└── AiwgServeHandle::emit() (if aiwg serve wired in)
│
operator responds
│
▼
POST /api/v1/hitl/{id}/respond ← injects text into PTY stdin
One pending request per session at a time — duplicate detections are suppressed until the active request is resolved.
aiwg Serve Integration
When AIWG_SERVE_ENDPOINT is set, the management server registers with an aiwg serve dashboard and streams live sandbox events over a persistent authenticated WebSocket. The integration reconnects with exponential backoff (1 s → 30 s) and never blocks server startup.
The sandbox additionally registers as an AIWG executor (per executor.v1.md), accepting mission dispatches via POST /api/v1/sessions/:id/dispatch and reporting the full mission.* lifecycle (assigned → started → completed/failed/aborted, with HITL and resumability) over a second WS at /ws/executors/{id}. Mission state persists across mgmt-server restarts in <secrets_dir>/../missions.json. Full integration spec: docs/aiwg-executor.md.
| Event | Trigger |
|---|---|
agent.connected |
gRPC stream registered |
agent.disconnected |
gRPC stream closed or timed out |
agent.ready |
cloud-init provisioning complete |
agent.provisioning |
loadout step progress |
session.start / session.end |
PTY/exec session lifecycle |
hitl.input_required |
HITL prompt detected |
A Real Walkthrough
What a typical autonomous coding task looks like end to end.
Provision
./images/qemu/provision-vm.sh agent-01 \
--loadout profiles/claude-only.yaml \
--agentshare \
--start
VM boots, cloud-init runs the loadout manifest, agent-client registers via gRPC, status transitions Starting → Provisioning → Ready. If aiwg serve is configured, agent.ready fires.
Submit a Task
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"prompt": "Refactor the authentication module to use JWT refresh tokens",
"repository": "https://github.com/myorg/myapp",
"model": "claude-opus-4-6",
"timeout_seconds": 7200
}'
Task is assigned to agent-01, repository cloned into inbox, Claude Code launched inside the VM.
Monitor in Real Time
Open http://localhost:8122 for the live terminal stream, or:
curl http://localhost:8122/api/v1/tasks/{task_id}/logs
Agent Pauses — HITL
An hour in, Claude Code hits an ambiguous refactor decision and prints a confirmation prompt. The dashboard shows a pending HITL request. Respond without opening a terminal:
curl -X POST http://localhost:8122/api/v1/hitl/{hitl_id}/respond \
-H "Content-Type: application/json" \
-d '{"response": "yes, update all callers"}'
The response text is injected into the agent's PTY stdin and the agent continues.
Collect Artifacts
ls /srv/agentshare/outbox/{task_id}/
# auth-module/ jwt-refresh.ts test-results.json SUMMARY.md
Provisioning
Profiles
Pre-built profiles for common setups:
| Profile | Tools | Use Case |
|---|---|---|
agentic-dev |
Python (uv), Node.js (fnm), Go, Rust, Claude Code, Aider, Docker, ripgrep, fd, jq | Full development environment |
basic |
SSH, basic utilities | Minimal — custom setup via cloud-init |
./images/qemu/provision-vm.sh my-agent \
--profile agentic-dev \
--cpus 8 \
--memory 16384 \
--disk 100G \
--agentshare \
--start
Loadout Manifests
Declarative YAML manifests for composable provisioning. Loadouts specify tools, runtimes, AI providers, and AIWG frameworks without modifying base profiles:
# profiles/claude-only.yaml
name: claude-only
tools:
- claude-code
- ripgrep
- fd
- jq
runtimes:
- python-uv
- nodejs-fnm
aiwg_frameworks:
- name: sdlc-complete
providers: [claude]
See docs/LOADOUTS.md for the full manifest schema and available options.
Task Orchestration
Submit tasks to agents via the REST API. The orchestrator assigns tasks to available VMs, manages the workspace, and tracks lifecycle state.
# Submit a task
curl -X POST http://localhost:8122/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"prompt": "Audit the API for SQL injection vulnerabilities",
"repository": "https://github.com/myorg/myapp",
"model": "claude-opus-4-6",
"timeout_seconds": 3600
}'
# Check status
curl http://localhost:8122/api/v1/tasks/{task_id}
# Stream logs (SSE)
curl http://localhost:8122/api/v1/tasks/{task_id}/logs
# List artifacts
curl http://localhost:8122/api/v1/tasks/{task_id}/artifacts
# List A2A task artifacts captured by messages:send
curl http://localhost:8122/agents/{instance_id}/v1/tasks/{task_id}/artifacts
See docs/task-orchestration-api.md for full API details and docs/task-run-lifecycle.md for the lifecycle state machine.
Human-in-the-Loop (HITL)
The server monitors agent PTY output and automatically detects when an agent is waiting for human input. When detected, a HITL request is created and held until resolved.
# List pending requests
curl http://localhost:8122/api/v1/hitl
# Respond — text is injected directly into the agent's PTY stdin
curl -X POST http://localhost:8122/api/v1/hitl/a3f1b2.../respond \
-H "Content-Type: application/json" \
-d '{"response": "y"}'
Requests are deduplicated per session — a second prompt won't fire while the first is pending. Once resolved, the slot opens again.
VM Lifecycle
# Provision and start
./images/qemu/provision-vm.sh agent-01 --profile agentic-dev --agentshare --start
# Lifecycle management
virsh start agent-01 # start stopped VM
virsh shutdown agent-01 # graceful stop
virsh destroy agent-01 # force stop
# Rebuild (preserves IP and config)
./scripts/reprovision-vm.sh agent-01 --profile agentic-dev
# Remove completely
./scripts/destroy-vm.sh agent-01
# Deploy updated agent binary to running VM
./scripts/deploy-agent.sh agent-01 --debug
See docs/vm-lifecycle.md for the state machine and docs/LIFECYCLE.md for the full operations reference.
API Reference
Agents
| Endpoint | Method | Description |
|---|---|---|
/api/v1/agents |
GET | List registered agents with metrics and loadout info |
/api/v1/agents/{id} |
GET | Get agent details |
/api/v1/agents/{id} |
DELETE | Remove agent |
/api/v1/agents/{id}/start |
POST | Start agent VM |
/api/v1/agents/{id}/stop |
POST | Stop agent VM |
/api/v1/agents/{id}/destroy |
POST | Force destroy agent VM |
/api/v1/agents/{id}/reprovision |
POST | Reprovision agent VM |
Tasks
| Endpoint | Method | Description |
|---|---|---|
/api/v1/tasks |
GET | List tasks |
/api/v1/tasks |
POST | Submit new task |
/api/v1/tasks/{id} |
GET | Get task status and metadata |
/api/v1/tasks/{id} |
DELETE | Cancel task |
/api/v1/tasks/{id}/logs |
GET | Stream task logs (SSE) |
/api/v1/tasks/{id}/artifacts |
GET | List task artifacts |
/agents/{instance_id}/v1/tasks/{task_id}/artifacts |
GET | List persisted A2A task artifacts |
/agents/{instance_id}/v1/tasks/{task_id}/artifacts/{artifact_id} |
GET | Return one persisted A2A task artifact |
VMs
| Endpoint | Method | Description |
|---|---|---|
/api/v1/vms |
GET | List all VMs |
/api/v1/vms |
POST | Create VM |
/api/v1/vms/{name} |
GET | Get VM details |
/api/v1/vms/{name}/start |
POST | Start VM |
/api/v1/vms/{name}/stop |
POST | Graceful stop |
/api/v1/vms/{name}/destroy |
POST | Force stop |
/api/v1/vms/{name} |
DELETE | Delete VM |
Human-in-the-Loop
| Endpoint | Method | Description |
|---|---|---|
/api/v1/hitl |
GET | List pending HITL requests |
/api/v1/agents/{id}/hitl |
POST | Create HITL request for agent (returns 409 on duplicate) |
/api/v1/hitl/{id}/respond |
POST | Submit response — injects text into PTY stdin |
Screen Observer
| Endpoint | Method | Description |
|---|---|---|
/api/v1/sessions/{id}/screen |
GET | Current PTY screen snapshot (no WebSocket needed) |
/ws/sessions/{id}/orchestrate |
WS | Live screen updates; defaults to observer/read-only. Add ?role=controller to allow write/resize/signal frames. |
System
| Endpoint | Method | Description |
|---|---|---|
/api/v1/secrets |
GET / POST / DELETE | Manage agent authentication secrets |
/api/v1/events |
GET | VM lifecycle event stream (SSE) |
/healthz |
GET | Liveness probe |
/readyz |
GET | Readiness probe |
/metrics |
GET | Prometheus metrics |
gRPC (Port 8120)
service AgentService {
rpc Connect(stream AgentMessage) returns (stream ManagementMessage);
rpc Exec(ExecRequest) returns (stream ExecOutput);
}
WebSocket (Port 8121)
Real-time push of agent metrics, PTY output, session events, and task progress. Used by the dashboard and external monitoring clients.
Configuration
Management Server
| Variable | Default | Description |
|---|---|---|
LISTEN_ADDR |
0.0.0.0:8120 |
gRPC listen address (WS = port+1, HTTP = port+2) |
SECRETS_DIR |
.run/secrets |
Directory containing agent-hashes.json |
RUST_LOG |
info |
Log level: trace, debug, info, warn, error |
LOG_FORMAT |
pretty |
Log format: pretty, json, compact |
HEARTBEAT_TIMEOUT |
90 |
Seconds before marking agent disconnected |
METRICS_ENABLED |
true |
Enable Prometheus metrics export |
AIWG_SERVE_ENDPOINT |
— | aiwg serve base URL (integration disabled if unset) |
AIWG_SERVE_NAME |
agentic-sandbox |
Display name in aiwg serve dashboard |
Agent Client
| Variable | Required | Description |
|---|---|---|
AGENT_ID |
Yes | Unique identifier for this agent |
AGENT_SECRET |
Yes | 256-bit shared secret for authentication |
MANAGEMENT_SERVER |
Yes | Server address, e.g. 192.168.122.1:8120 |
HEARTBEAT_INTERVAL |
No | Seconds between heartbeats (default: 30) |
Override settings in management/.run/dev.env without modifying environment.
Monitoring
The management server exports Prometheus metrics at /metrics:
agentic_agents_connected # Connected agent count
agentic_agents_ready # Ready agents
agentic_tasks_running # Active tasks
agentic_tasks_completed_total # Total completed tasks
agentic_commands_total # Commands dispatched
agentic_commands_duration_ms # Command execution latency (histogram)
Set up Prometheus and AlertManager:
cd scripts/prometheus && ./deploy.sh
# Prometheus: http://localhost:9090
# AlertManager: http://localhost:9093
See docs/monitoring.md and docs/observability/ for alerting rules and dashboards.
Development
# Full cycle: rebuild server + agent, deploy to all running VMs
./scripts/dev-deploy-all.sh --debug
# Deploy agent binary to a specific VM
./scripts/deploy-agent.sh agent-01 --debug
# Management server live-reload
cd management && ./dev.sh
# E2E tests
./scripts/run-e2e-tests.sh
# Chaos tests
./scripts/chaos/run-all.sh
# Unit tests
cd management && cargo test
cd agent-rs && cargo test
Directory Structure
agentic-sandbox/
├── management/ # Management server (Rust)
│ ├── src/
│ │ ├── http/ # REST API handlers
│ │ ├── orchestrator/ # Task orchestration engine
│ │ ├── telemetry/ # Logging, metrics, tracing
│ │ ├── ws/ # WebSocket hub and connections
│ │ ├── hitl.rs # HITL request store
│ │ ├── aiwg_serve.rs # Outbound aiwg serve integration
│ │ ├── screen_state.rs # PTY screen observer
│ │ ├── prompt_detector.rs # HITL prompt heuristics
│ │ └── crash_loop.rs # Crash loop detection
│ └── ui/ # Embedded web dashboard
├── agent-rs/ # Agent client (Rust)
├── cli/ # CLI tool — VM management
├── proto/ # gRPC protocol definitions
├── images/qemu/ # VM provisioning scripts and loadout profiles
├── scripts/ # Utility and deployment scripts
├── configs/ # Security profiles (seccomp)
├── docs/ # Reference documentation
└── tests/e2e/ # End-to-end tests (pytest)
Documentation
| Document | Description |
|---|---|
| Architecture | System design and component relationships |
| Positioning | Design axes and when this is (or isn't) a good fit |
| API Reference | Complete HTTP, gRPC, and WebSocket API |
| WebSocket Protocol | Per-message reference: legacy agent-scoped + formal session-registry protocols |
| CLI Design | sandboxctl operator/admin CLI taxonomy and acceptance criteria |
| Deployment Guide | Installation and production configuration |
| Operations Guide | Day-to-day operations and runbooks |
| Loadouts | Declarative VM provisioning manifests |
| Agentshare Storage | virtiofs storage layout and usage |
| Task Orchestration | Task API and lifecycle |
| Task Run Lifecycle | State machine and transitions |
| Session Reconciliation | Session recovery after restarts |
| VM Lifecycle | VM state machine and management |
| Troubleshooting | Common issues and fixes |
| Monitoring | Prometheus metrics and alerting |
| Observability | Full observability setup |
| Reliability | Reliability patterns and quickstart |
Roadmap
- QEMU/KVM provisioning with cloud-init
- Management server (Rust/gRPC/WebSocket/HTTP)
- Agent client with registration, heartbeat, and metrics
- virtiofs shared storage (global/inbox)
- Web dashboard with live terminal access
- Task orchestration with artifact collection
- Claude Code integration
-
sandboxctloperator/admin CLI (design) - Declarative loadout manifest system
- Prometheus metrics and AlertManager alerting
- Session reconciliation after server restart
- VM pooling and resource quotas
- PTY screen observer (server-side virtual terminal snapshots)
- Human-in-the-Loop detection and REST API
- aiwg serve outbound registration and event streaming
- Crash loop detection and alerting
- Docker runtime with rootless containers
- Multi-host orchestration
- Kubernetes operator
License
MIT — see LICENSE
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi