Anvil
Health Pass
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 16 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Provider-agnostic AI dev pipeline: clarify → plan → build → review → PR across your repos, mixing LLM providers per stage with your own keys. No vendor lock-in, no markup.
The provider-agnostic AI development pipeline
Use your own keys. Mix providers per stage. Pay per token, not per seat.
Anvil ships features end-to-end — clarify, plan, build, review, PR —
across every repo in your project, on whatever model is cheapest for each stage.
No vendor lock-in. No markup. No hosted plan.
Anvil is an open-source, self-hosted AI coding agent — an end-to-end LLM pipeline (clarify → plan → build → review → PR) that runs on Claude, GPT, Gemini, and OpenRouter, or fully local via Ollama & OpenCode. It speaks the Model Context Protocol (MCP) and uses your own API keys. Written in TypeScript.
Dashboard preview — pipeline orchestration, live agent activity, knowledge graph, cost ledger.
Click the gif to watch the full demo
Plan on Claude. Build on Ollama. Review on GPT. Ship on a local model.
One pipeline. Eight providers. Whatever's cheapest for each stage.
Why teams pick Anvil
Mix providers within a single pipelineRouting is per-stage, not per-run. A single feature can flow through |
Cheap by designRouting-by-stage means premium models only show up where premium |
No vendor SDK lock-inEvery HTTP adapter is hand-rolled |
Bring your own keys, or don'tOllama works fully offline. OpenCode's $10/mo Zen subscription |
Quick start
# 1. Install
npm install -g @esankhan3/anvil-cli
# 2. Set up a project (interactive — answers a handful of questions)
anvil init
# 3. Open the dashboard and ship
anvil dashboard
That's the whole onboarding. anvil init creates ~/.anvil/,
seeds models.yaml, scaffolds your project's factory.yaml, and
runs a health check. anvil dashboard boots the WebSocket
control plane and opens the UI.
First time? The full walk-through — prerequisites, where to
get provider keys, whatanvil initwill ask you, troubleshooting
— lives indocs/getting-started.md.
Provider-agnostic by design
Eight providers ship in the box. One config file picks them per
stage. Each adapter speaks the same streaming format, the sameUpstreamError retry shape, the same per-call cost calculation.
| Provider | Tier slot | Best for |
|---|---|---|
| OpenCode (Zen) | local |
Hosted open-coding models, $10/mo flat — replaces GPU-heavy Ollama |
| Ollama | local |
Fully offline, your own GPU, embeddings + reranking |
| Claude (CLI) | cheap / premium |
Best-in-class reasoning, native tool use |
| OpenAI | cheap / premium |
GPT-5, o-series reasoning |
| Gemini | cheap / premium |
Long context, Gemini 2.5 Pro |
| OpenRouter | any | Single key, hundreds of models |
| Google ADK | premium |
When you need ADK's runner semantics |
| Gemini CLI | utility | Subprocess fallback |
One run, three providers, fourteen cents
Routing is per-stage, not per-run. The same feature can flow
through three providers without you lifting a finger:
clarify → Ollama / OpenCode local ~ $0.00
plan → Claude Sonnet deep analysis ~ $0.05
build → Ollama / OpenCode local ~ $0.00
test → Ollama / OpenCode local ~ $0.00
validate → Claude Haiku cheap + fast ~ $0.01
review → Claude Sonnet judgment-heavy ~ $0.08
ship → Ollama / OpenCode local ~ $0.00
──────────
~ $0.14
It's just YAML in ~/.anvil/stage-policy.yaml. Premium models only
appear where premium models actually matter. Read-only research
and the fix-retry loop are locked to free tier — they cannot
escalate, by design. A typical run with Ollama or OpenCode burns
single-digit dollars on cloud calls.
Auto-failover when a provider misbehaves
If a model 429s, 5xx's, hits a quota wall, or fails its liveness probe
mid-run, Anvil's chain-walker burns it for the rest of the run and
falls through to the next entry in the same tier — same provider or
different, your call. The pipeline doesn't pause, doesn't surface a
stack trace at the user, and doesn't double-charge by retrying the same
broken model. Every fallback hop emits a routing event so you can see
exactly which model was skipped and why.
clarify → adk:gemini-2.5-flash ❌ (provider liveness fail)
↪ opencode/kimi-k2.6 ✅ (next in chain, same tier)
build → opencode/qwen3.5-plus ❌ (429 — Alibaba upstream)
↪ opencode/glm-5.1 ✅ (model burned for run, fallback proceeds)
Two layers of detection: a proactive liveness probe at run start
(Ollama /api/tags, env-var presence for cloud) and a reactive
duck-typed UpstreamError check on every adapter call. Configurable
per-run cap on retry attempts in models.yaml (walker.max_attempts).
One GPU, many models — exclusive slot serialization
Big local models can't all share a single GPU at the same time.
If your clarify and build stages both want a heavy Ollama
model, naive concurrency ends in an OOM. Anvil'sexclusive_slot: true flag puts those models behind a
process-local FIFO queue so only one exclusive model is ever
GPU-resident at a time.
The queue does the dance for you:
- Hard eviction on switch. Going from model A → model B
explicitly tells Ollama to release A's weights, then polls until
the GPU has actually freed them before letting B load. No GPU
briefly holding both. - Intruder detection. An out-of-band Ollama session on the host
(e.g. you ranollama runin another terminal) gets evicted
before the next exclusive load — so two big models can't sneak
in side by side. - Embeddings + rerankers bypass. They're small enough to
co-reside, so they never touch the queue. - Same-model calls are free. Consecutive calls to the same id
skip the eviction step entirely. A stage fanning out across repos
pays the model-load cost once.
Mark a model exclusive in ~/.anvil/models.yaml:
- id: ollama/qwen2.5-coder:14b
provider: ollama
tier: local
vram_gb: 9
exclusive_slot: true # mandatory for any VRAM-heavy local
Lets you mix multiple big local models on the same machine without
manual sequencing or OOM kills, regardless of how much VRAM you
actually have.
Cost ledger, live
Every adapter call attaches a real gen_ai.usage.cost attribute
computed from a vendored LiteLLM pricing snapshot. The dashboard
shows you per-call, per-stage, per-run spend in real time. The
OpenTelemetry export carries the same numbers if you want them in
Langfuse, Tempo, or Honeycomb.
No estimates. No surprises.
What you can do with Anvil
PipelineNine-stage feature pipeline — clarify, plan, build, test, validate, |
PlanGenerates a structured markdown plan before any code is written. |
PR ReviewMulti-pass automated review with evidence gates, incident binding, |
MemoryLong-term project memory with five types — working, episodic, 📊 83.4% on LoCoMo (full 1,540-question set, |
ProjectMulti-repo first. One |
Knowledge BaseAST-aware chunking via tree-sitter, hybrid retrieval (vector + |
SettingsProvider keys, model registry, stage policy, OTel endpoint — all |
ConventionExtracts your codebase's real conventions — naming, imports, |
HistoryEvery run, replayable. Diffs, PR URLs, reviewer verdicts, cost |
ResearchRead-only investigation — "what does this service do?" or |
Bug FixTargeted fix workflow with a tight retry loop. Locked to local + |
ObservabilityOpenTelemetry spans with GenAI semantic conventions. Plug in |
Auto-failoverProvider goes down, hits a quota wall, or fails its liveness probe? |
Resume + rollbackEvery run is checkpointed per stage. Resume from the failing stage |
MCP serverAnvil's knowledge-base retriever ships as a standalone MCP server. |
Durable execution new in 0.3.0Pattern-2 durable execution. Every step and every side effect is |
Race arbitration new in 0.3.0Multi-process lease arbitration. Each run holds a TTL'd lease in |
Policy gates new in 0.3.0Policy editor lives at |
Observability (opt-in)
Telemetry is off by default. When you turn it on, every adapter
call emits an OpenTelemetry span with GenAI semantic conventions —
prompt + completion tokens, cost, latency, model, provider, error
class. Plug in any OTLP-compatible backend.
Two switches, one env var each
# 1. Export to a real OTLP collector — Langfuse, Tempo, Honeycomb, …
echo 'OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3000/api/public/otel/v1/traces' >> ~/.anvil/.env
echo 'OTEL_SERVICE_NAME=anvil-dashboard' >> ~/.anvil/.env
# 2. Or dump spans to stderr — useful for debugging without a collector
echo 'ANVIL_OTEL_CONSOLE=1' >> ~/.anvil/.env
Restart the dashboard and traces start flowing.
Privacy + noise controls
| Variable | Default | What it does |
|---|---|---|
ANVIL_OTEL_DISABLED |
unset | Hard kill-switch — set to 1 to disable everything |
ANVIL_OTEL_RECORD_CONTENT |
0 |
Set 1 to include prompt + completion text on spans (truncated to 8 KB per attribute) |
OTEL_LOG_LEVEL |
NONE |
Set to ERROR / INFO / DEBUG to surface SDK errors when debugging |
ANVIL_OTEL_BATCH |
unset | Set 1 to batch span exports (lower IO, slightly delayed arrival) |
By default, spans carry structure but not content — model, cost,
latency, error class, all attached. Prompts and completions stay on
disk only.
Quick local stack: Langfuse
Anvil ships a tuned Langfuse compose file atinfra/observability/ — Langfuse 3.x +
Postgres + ClickHouse + Redis + MinIO, pre-wired for the OTLP HTTP
endpoint Anvil exports to. No external clone needed:
# Spin up the bundled stack on http://localhost:3000
docker compose -f infra/observability/docker-compose.yml up -d
# In ~/.anvil/.env (use the keys you create in the Langfuse UI)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:3000/api/public/otel/v1/traces
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer pk-lf-...
OTEL_SERVICE_NAME=anvil-dashboard
Anvil's dashboard auto-detects the local Langfuse on port 3000 — if
it's running and you haven't set OTEL_EXPORTER_OTLP_ENDPOINT
yourself, the dashboard wires it up automatically. Tear down withdocker compose -f infra/observability/docker-compose.yml down -v.
What you'll see
- One
anvil.agent.sessionparent span per pipeline stage,
linking every adapter call and resume into a single trace. - A
gen_ai.invokechild span per LLM call, withgen_ai.system,gen_ai.request.model,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens, and a realgen_ai.usage.cost
in USD. gen_ai.tool.<name>child spans for every tool call the agent
makes, closed when the matchingtool_resultarrives.- A routing-decision attribute group (
anvil.routing.*) on the
invoke span so you can see why a particular model was picked, and
which models got burned mid-run.
The OTLP export carries the same numbers the dashboard's cost panel
shows. One source of truth.
How it all fits together
Anvil is a TypeScript monorepo. Each package owns one concern; the
dashboard ties them together.
┌────────────────────────┐
│ anvil dashboard │ the control plane
│ (React + WebSocket) │
└────────────┬───────────┘
│ orchestrates
▼
┌───────────────────────────────────────┐
│ pipeline-runner │
│ 9-stage walker · per-repo fan-out │
│ validate-fix loop · chain-fallback │
└───┬──────┬──────┬──────────┬──────┬───┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────┐ ┌────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│agent-│ │core│ │knwldg│ │memory│ │convention│
│ core │ │pipe│ │ core │ │ core │ │ -core │
└──┬───┘ └─┬──┘ └───┬──┘ └───┬──┘ └────┬─────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
~/.anvil/ · models.yaml · stage-policy.yaml
· runs/<id>/ · features/<slug>/
· knowledge-base/<project>/
· memories/ · conventions/
Three different fronts ride on the same engine:
anvilCLI —init,doctor,dashboard(the front door)- Dashboard — full pipeline control with live agent activity
code-search-mcp— the standalone code-search product:
MCP server, thecode-searchCLI (index/query/status/reset/daemon/serve/mcp), and thecode-search-daemonlong-running indexer. Three bins, one
install. Works without the Anvil agent stack.
Per-package deep dives
| Package | What it owns |
|---|---|
@esankhan3/anvil-cli |
CLI entry point + bundled dashboard |
@anvil-dev/dashboard |
React UI + WebSocket pipeline orchestrator (private — bundled into the CLI tarball) |
@esankhan3/anvil-agent-core |
8 LLM adapters, router, cost, OTel |
@esankhan3/anvil-core-pipeline |
Typed Step<I,O> graph + EventBus + hooks |
@esankhan3/anvil-knowledge-core |
AST chunks, graph, hybrid retrieval |
@esankhan3/anvil-memory-core |
Five-type memory, bi-temporal, drift detection |
@esankhan3/anvil-convention-core |
Convention extractor + promotion ledger |
@esankhan3/code-search-mcp |
Standalone code-search: MCP server + code-search CLI + code-search-daemon |
Configuration
Three files run the show, all in ~/.anvil/:
| File | What it does |
|---|---|
.env |
Provider keys + observability switches |
models.yaml |
The model registry — local, cheap, premium tiers |
stage-policy.yaml |
Which tier handles which pipeline stage |
Working examples live in examples/anvil-home/.
Bootstrap with:
cp examples/anvil-home/.env.example ~/.anvil/.env && chmod 600 ~/.anvil/.env
cp examples/anvil-home/models.yaml ~/.anvil/models.yaml
cp examples/anvil-home/stage-policy.yaml ~/.anvil/stage-policy.yaml
anvil init does the equivalent for models.yaml automatically.
Project setup examples
Three opinionated starters in examples/:
- TypeScript monorepo — Next.js
storefront + Express API, Postgres, Redis - Go microservices — multi-service
Go workspace - Python ML — training + serving split
Copy a factory.yaml, adjust paths, and anvil init against your
own workspace.
Built with
We rely on the best of the open ecosystem:
tree-sitter ·LanceDB ·graphology ·OpenTelemetry ·Model Context Protocol ·React ·Vite ·commander
Packages
The monorepo publishes a single user-facing CLI plus the building blocks
it sits on top of. Every package below is published with npm provenance
(sigstore attestation linking the tarball back to this repo) — npm install
verifies the chain automatically.
| Package | Purpose | npm |
|---|---|---|
@esankhan3/anvil-cli |
The user-facing CLI + bundled dashboard. Run npx @esankhan3/anvil-cli to start. |
|
@esankhan3/anvil-agent-core |
Shared LLM stack — unified LanguageModel interface, provider adapters, agent subprocess machinery, cost calc. |
|
@esankhan3/anvil-knowledge-core |
AST chunking, tree-sitter parsing, embeddings, LanceDB vector store, hybrid retrieval. | |
@esankhan3/anvil-memory-core |
Long-term memory — five-type taxonomy, bi-temporal facts, drift detection, sleeptime ratification. 83.4% on LoCoMo. | |
@esankhan3/anvil-convention-core |
Convention extraction, rule engine, promotion ledger. | |
@esankhan3/anvil-core-pipeline |
Typed Step<I,O> graph, EventBus, StepRegistry, lifecycle hooks. |
|
@esankhan3/code-search-mcp |
Standalone code-search product — MCP server + code-search CLI + code-search-daemon (file-watcher + UDS JSON-RPC). |
The dashboard (
@anvil-dev/dashboard) is bundled inside the cli — it is
not published as a standalone npm package.
Status
|
MVP 2 — Active The dashboard is the canonical interface. The CLI ships |
Stable Pipeline orchestration · multi-provider routing · knowledge |
In flight: richer plan validators · deeper RAG-eval ·
additional MCP tools · cost-policy enforcement (UI scaffolded —
ships in the next minor) · notification channels (Slack + email).
License
MIT — bring it to your team, fork it, ship it.
No hosted plan. No telemetry sent to us.
Your code, your keys, your budget. That's the deal.
Built for engineers who want their AI tools to respect their stack and their wallet.
Crafted by Esan Mohammad
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found
