arli
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Fail
- rm -rf — Recursive force deletion command in install.sh
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Production harness between LLMs and the real world. Kernel sandbox that cannot escape. Verification pipeline before delivery. On-chain attestation via ENSO/ICP. Fault-tolerant execution with rollback. Self-healing memory. 36 providers. 20 platforms. Live trading. Rust.
ARLI
Production-Grade AI Agent Harness — Rust. Kernel Sandbox. On-Chain Settlement.
Single 12MB binary. Zero runtime dependencies beyond the kernel.
Actor-based agent loop. Swarm orchestration. 20-platform messaging gateway.
Landlock + seccomp sandbox. OCSF audit trail. ENSO/ICP on-chain attestation.
Hyperliquid live trading. Self-healing experiential memory. Autonomous oracle loop.
What ARLI Is
ARLI is a complete agent harness — the infrastructure layer between an LLM and the real world. It doesn't just call models. It:
- Executes agent actions inside a kernel-level sandbox (Landlock + seccomp)
- Verifies work before claiming it's done (compile → lint → test → fuzz pipeline)
- Attests on-chain via ENSO/ICP — cryptographic proof of what ran, how, and under what policy
- Learns from failures — experiential memory that remembers fixes and auto-applies them
- Survives faults — workspace snapshots with rollback on failure, clean retries every time
- Explains failures — classifies why something broke (sandbox killed it? agent wrote bad code? ENSO rejected?)
- Governs itself — risk scoring, approval queues, audit trail for every action
- Self-analyzes —
arli harness analyzereads telemetry and tells you what to fix
Everything is Rust. Everything is local. No Python. No Docker. No cloud dependency.
Architecture
┌──────────────────────────────────────────────────────────────┐
│ EXTERNAL — 20 Messaging Platforms │
│ Telegram Discord Slack WhatsApp Matrix Teams Email │
│ Signal LINE Feishu WeCom QQ SMS Google Chat DingTalk │
│ IRC ntfy SimpleX Yuanbao BlueBubbles Webhooks │
└──────────────────────┬───────────────────────────────────────┘
│
┌───────────────▼──────────────────────────┐
│ arli-gateway Daemon │
│ Built-in daemon (no systemd required) │
│ Per-chat agent sessions │
│ Prometheus /metrics /healthz /readyz │
└──────┬──────────────┬─────────────────────┘
│ │
┌─────────────▼──┐ ┌────────▼─────────────────────────┐
│ Agent Actor │ │ Swarm Orchestrator │
│ Mailbox-driven│ │ spawn/steer/kill/restart │
│ Context press.│ │ TaskRouter · fan-out · affinity │
│ Auto-compact. │ │ Shared Memory (agent blackboard) │
│ Hook system │ │ Agent registry (SQLite) │
└───────┬────────┘ └───────────────────────────────────┘
│
│ ┌──────────────────────────────────────────────┐
│ │ HARNESS LAYER — executes BEFORE attestation │
│ │ │
│ │ ┌──────────────┐ ┌───────────────────────┐ │
│ │ │ Verification │ │ Fault-Tolerant │ │
│ │ │ Pipeline │ │ Sandbox │ │
│ │ │ compile→lint │ │ snapshot→execute→ │ │
│ │ │ →test→fuzz │ │ success=commit │ │
│ │ └──────────────┘ │ failure=rollback │ │
│ │ └───────────────────────┘ │
│ │ ┌──────────────┐ ┌───────────────────────┐ │
│ │ │ Governance │ │ Failure Attribution │ │
│ │ │ Engine │ │ │ │
│ │ │ risk score→ │ │ SandboxKilled? │ │
│ │ │ approve/deny │ │ EnsoRejected? │ │
│ │ │ /queue │ │ VerificationFailed? │ │
│ │ └──────────────┘ └───────────────────────┘ │
│ └──────────────────────────────────────────────┘
│
┌───────▼──────────────────────────────────────────────────┐
│ 18 BUILT-IN TOOLS │
│ terminal read write patch search hashedit ast_edit │
│ resolve browser web_search vision voice memory │
│ image_generate video_generate delegate execute_code │
└───────┬──────────────────────────────────────────────────┘
│
┌───────┼──────────────┬──────────────────┬─────────────────┐
│ │ │ │ │
│ 36 PROVIDERS STORAGE LIVE TRADING ENSO/ICP│
│ 3 adapters SQLite+WAL Hyperliquid SDK Attest. │
│ OpenAI-compat FTS5 search Perps · Spot Payment │
│ Anthropic 12 backends WS fusion feed Oracle │
│ OpenRouter Skill loader Cointegration loop │
└───────────────────────────────────────────────────────────┘
LEARNING LAYER (persisted to ~/.arli/)
┌───────────────────────────────────────────────────────────┐
│ Experiential Memory Harness Telemetry Task Contracts│
│ failure→fix lessons per-tool metrics upfront work │
│ auto-apply on retry failure rates declarations │
│ verified/unverified memory hit rate SHA-256 hash │
│ stale pruning policy violations in attest. │
│ │
│ Task State Quality Critic Evolution CLI│
│ phased execution heuristic + LLM harness │
│ trail for ENSO response review analyze │
└───────────────────────────────────────────────────────────┘
Quick Start
# Install (Linux / macOS, no Rust toolchain required)
curl -fsSL https://raw.githubusercontent.com/ARLI-Research/arli/main/install.sh | bash
# Configure
arli setup
# Chat
arli chat # Interactive TUI
arli chat -q "Fix this bug" # Single query
# Gateway
arli gateway start # 20-platform messaging daemon
# Self-update
arli update
The Harness — What Makes ARLI Different
A harness is not just a tool-calling loop. It's the infrastructure that makes agent execution verifiable, fault-tolerant, and auditable. ARLI implements the full stack from the "Code as Agent Harness" survey (458 papers, 2025–2026).
Fault-Tolerant Sandbox
Before executing a contract, ARLI snapshots the workspace. If anything goes wrong — verification fails, ENSO disputes, sandbox kills the process — the workspace rolls back to a clean state. The next retry starts from the same baseline, not from garbage.
Contract received → snapshot workspace → execute → ✓ commit / ✗ rollback
Git-based when available, tar.gz fallback for non-git workspaces.
Hierarchical Sandbox Profiles
Not one sandbox — four. Each contract declares what level it needs:
| Profile | Network | Memory | Timeout | Use |
|---|---|---|---|---|
| Build | No | 1 GB | 5 min | Compile/lint only |
| Test | Yes | 2 GB | 10 min | Build + test fixtures |
| Deploy | Full | 4 GB | 20 min | Integration, staging |
| Unsafe | Full | None | None | Trusted code only |
The harness enforces that the actual sandbox is at least as restrictive as the contract demands. A Build contract cannot run under Unsafe.
Verification Pipeline
Attestation doesn't happen on faith. Before submitting to ENSO, ARLI runs:
compile → lint → test → fuzz
Auto-detects commands from the workspace (Cargo.toml → cargo check, package.json → npm test, etc.). If compile or test fails — attestation is blocked. ENSO never sees a "Verified" on broken code.
Task Contracts
Upfront declaration of what the agent will do. Before execution, the contract specifies expected artifacts and success checks. After execution, the contract hash is included in the ENSO attestation — cryptographic proof that the agent delivered exactly what it promised.
let contract = TaskContract {
goal: "Fix compilation error in src/main.rs",
expected_artifacts: vec!["src/main.rs"],
success_checks: vec!["cargo check", "cargo test"],
sandbox_policy: Some("build"),
};
// Contract hash: sha256("goal=...&artifacts=...&checks=...")
// Included in ArliAttestation → ENSO verifies it
Experiential Memory
The agent learns from failures. When cargo build fails with "borrow checker", ARLI records the fix ("clone before move"). Next time the same error appears, it auto-applies the fix without wasting a retry.
- MemGovern filter: fixes must be verified to persist. Unverified fixes older than 7 days are auto-pruned.
- Memory health:
arli harness analyzeshows hit rate — is the memory actually helping or just noise?
Harness Telemetry
Per-tool, per-policy, per-task metrics. Persistent to ~/.arli/telemetry.json.
$ arli harness analyze
═══ ARLI Harness Analytics ═══
─── Tools ───
Total calls: 1247 | Failures: 89 | Rate: 7.1%
Worst: terminal (23% failure, 156 calls)
─── Sandbox Policies ───
policy-abc: 12 violations — may be too restrictive
─── Experiential Memory ───
Lessons: 47 total | 31 verified | 16 unverified
Hit rate: 68% — effective
─── Recommendations ───
1. Fix tool 'terminal': fails 23% — check timeout config
2. 16 unverified lessons — run verification or prune stale
Failure Attribution
When a contract fails, ARLI doesn't just say "Disputed". It classifies why:
| Category | Example | Retry? |
|---|---|---|
| VerificationFailed | cargo test returned exit code 1 |
No — fix the code |
| SandboxKilled | Process killed by OOM, SIGKILL | Yes — increase limits |
| EnsoRejected | agent_id does not match contract | No — fix registration |
| NetworkError | ICP call timed out | Yes — retry with backoff |
| InternalError | Failed to serialize attestation | No — investigate |
45+ error patterns matched. Confidence scored. Actionable operator guidance.
Agent Governance Toolkit
Every agent action flows through a governance checkpoint:
Agent wants to run "deploy" → RiskScore: 80/high
→ Exceeds approval threshold → Queued for human approval
→ Ticket gov-0001 created, expires in 5 min
→ Human approves → Action executes
→ Audit trail recorded
Risk taxonomy: SAFE(0) → LOW(20) → MEDIUM(50) → HIGH(80) → CRITICAL(100). Per-tool auto-classification. Policy fingerprint included in ENSO attestation — proves governance was active.
Quality Critic
Two-layer response review before delivery:
- Heuristic (fast, free): catches empty responses, hallucination markers ("as an AI, I cannot..."), repeated sentences, missing code blocks
- LLM Critic (accurate, cheap model): structured JSON critique — score 1–10, issues with severity/category/suggestion
{
"score": 4,
"acceptable": false,
"issues": [
{
"severity": "error",
"category": "factual_error",
"description": "Claims function returns String but it returns Result<String>",
"suggestion": "Check the actual return type in src/lib.rs line 42"
}
],
"summary": "Response contains a factual error about the return type"
}
Multi-Agent Shared Memory
Swarm agents share a common blackboard. Thread-safe, versioned, TTL-aware.
Agent A discovers market signal → writes "market:BTC:signal=BUY"
Agent B reads "market:BTC:signal" → opens position
Agent C reads result → reports to ENSO
Namespaced queries: read_by_prefix("market:") returns all market data. read_by_author("agent-2") returns everything agent-2 discovered. Persisted to ~/.arli/shared_memory.json.
Commands
arli chat Interactive TUI chat
arli chat -q "..." Single query
arli setup Interactive configuration wizard
arli model Change model/provider interactively
arli doctor System health check
arli update Self-update from GitHub Releases
arli harness analyze Analyze telemetry + lessons → insights + recommendations
arli gateway start Start messaging daemon
arli gateway stop Stop daemon
arli gateway status Daemon status
arli gateway log -l N View last N log lines
arli config show Display current configuration
arli config set k v Set config value
arli config path Print config file location
arli sessions List recent sessions
arli cron add Schedule recurring task
arli cron list List all cron jobs
arli cron start Start cron scheduler
arli serve -p PORT Health check HTTP server
arli dashboard -p PORT Web UI dashboard (metrics + logs)
arli key generate Ed25519 keypair for ENSO attestation
arli key show Show public key
arli enso setup ENSO ICP integration setup
arli enso status Key, config, active contracts
arli enso pay <id> Build + sign attestation
arli enso oracle Autonomous attestation loop (daemon mode)
arli marketplace rfq-create Create RFQ for ENSO marketplace
arli marketplace rfq-list List open RFQs
arli marketplace stats Marketplace statistics
arli kanban create Create task board
arli kanban add Add card
arli kanban show View board
arli kanban move Move card between columns
arli mcp MCP server on stdio
arli profile ... Manage named profiles
arli webhook ... Manage webhook subscriptions
arli checkpoint ... Session checkpoint management
arli plugins list List discovered plugins
arli completion bash Generate shell completions
Live Trading — Hyperliquid
Native Rust trading via hypersdk. Perpetuals + spot, WebSocket fusion feed, cointegration signals, Prometheus metrics.
use arli_trading::fusion_live::FusionLiveTrader;
let trader = FusionLiveTrader::new(config, metrics).await?;
trader.run().await; // WebSocket loop + signals + execution
- 229 perpetual + 19 spot markets on Hyperliquid mainnet
- Cointegration signal generation (10 min candles)
- OCSF event hash embedded in every order for attestation
- Prometheus metrics: PnL, positions, signal count, execution latency
ENSO — On-Chain Settlement (ICP)
Full attestation loop: contract → sandbox execution → verification pipeline → ed25519 attestation → ICP settlement. Single atomic call.
ARLI Oracle ENSO Contracts (ICP) ICP Ledger
───── ──────────────────── ──────────
│ │
│ 1. Snapshot workspace │
│ 2. Verify (compile→test) │
│ 3. Execute agent │
│ 4. Attribute failures │
│ 5. Build OCSF attestation │
│ + ed25519 signature │
│ + task contract hash │
│ + governance fingerprint │
│ │
│ 6. submit_arli_payment ────→ │
│ (one ICP call) │ verify attestation
│ ├─ ed25519 verify
│ ├─ binary hash match
│ ├─ sandbox config match
│ ├─ Landlock enforced
│ ├─ Seccomp enforced
│ │
│ │ settlement triggered
│ │ escrow released ────→ USDC/ICP
│ │
│ ←─── ArliPaymentResult │
│ status + tx_id + amount│
Comparison
| Hermes | Claude Code | ARLI | |
|---|---|---|---|
| Language | Python | TypeScript | Rust |
| Binary size | ~200MB | ~150MB | 12MB |
| Cold start | 2–5s | 1–3s | ~50ms |
| LLM providers | 37 | 3 | 36 |
| Messaging platforms | 21 | — | 20 |
| Kernel sandbox | Partial | Partial | Landlock + seccomp |
| Fault tolerance | — | — | Snapshot + rollback |
| Verification pipeline | — | — | compile→lint→test→fuzz |
| Task contracts | — | — | Upfront declarations + hash |
| Experiential memory | Partial | — | Learn from failures, auto-apply |
| Failure attribution | — | — | 6 categories, 45+ patterns |
| Governance toolkit | Partial | — | Risk scoring + approval queue |
| Quality critic | — | — | Heuristic + LLM review |
| Shared agent memory | — | — | Swarm blackboard |
| Harness analytics | — | — | CLI: arli harness analyze |
| Swarm orchestration | Partial | Partial | Native (TaskRouter, fan-out) |
| Cron scheduler | Native | — | Native |
| MCP server | Native | — | Native |
| Self-update | — | Native | Native |
| Live trading | — | — | Hyperliquid, WebSocket, cointegration |
| On-chain settlement | — | — | ENSO/ICP, atomic verify+settle+release |
| Kanban boards | — | — | SQLite, WIP limits |
| Web dashboard | Native | — | axum + htmx |
| Audit logging | — | — | OCSF (SIEM-compatible) |
| Tests | — | — | 448 (0 fail) |
Sandbox
Kernel-level isolation. Three enforcement layers applied before the child process starts:
- Seccomp BPF — syscall whitelist. Blocks
ptrace,mount,reboot,kexec_load, and 30+ other dangerous calls - Landlock — filesystem access control at kernel level. Whitelist directories, deny everything else
- Privilege drop —
initgroups→setgid→setuidbeforeexec. Process runs as nobody (UID 65534)
let sandbox = Sandbox::from_policy(&policy)?;
let output = sandbox.execute_isolated("cargo build --release")?;
Configuration
# ~/.arli/config.toml
model = "deepseek-chat"
max_iterations = 90
[provider]
name = "deepseek"
api_key = "sk-..."
[gateway]
telegram_token = "..."
[enso]
icp_gateway = "https://icp0.io"
contracts_canister_id = "..."
[trading]
hyperliquid_wallet = "..."
Install
# One-liner
curl -fsSL https://raw.githubusercontent.com/ARLI-Research/arli/main/install.sh | bash
# From source
git clone https://github.com/ARLI-Research/arli
cd arli && cargo build --release
Tests
cargo test -p arli-core # 448 tests, 0 fail
cargo test --workspace # all crates
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found