maestro
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Uyari
- process.env — Environment variable access in eval/run.ts
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Open-source Fugu: the open-source LLM orchestration brain. Route any models (open and closed) behind one OpenAI and Anthropic compatible endpoint, cheap-first with verify/escalate and full cost transparency. Self-hostable. Early v0.1. MIT.
🎼 Maestro
The open-source orchestration brain for LLMs.
Route any models (open and closed) behind one OpenAI-compatible endpoint.
Cheap-first, verify, escalate. Full cost and route transparency. Self-hostable. MIT.
🐡 Maestro is the open-source version of Sakana's Fugu, built from the two papers: TRINITY (2512.04695) and Conductor (2512.04388).
Open-source Fugu: open, honest, EU-clean, runs anywhere, and not locked to three closed models.
The one thing no other open-source project does: Maestro is the only open-source LLM router you can self-host with zero setup and no GPU (bring one API key) that is both OpenAI- and Anthropic-compatible (works in Claude Code and opencode) with per-request cost transparency and a reproducible benchmark. OpenFugu needs a GPU; LoRA-Harness needs a GPU. See COMPARISON.md.
⚠️ Status: early. This is an honest ~5-hour build (v0.1). The core works and is tested live on real models (routing, verify/escalate, tool/agent loops, OpenAI + Anthropic APIs, streaming, transparency). It is NOT production-hardened yet, and the learned router is not built. Treat it as a strong, runnable foundation, not a finished product. See the Roadmap for what is next and VERIFICATION.md for exactly what is proven vs assumed.
Modern LLM stacks have a problem: the best model for a one-line translation is not the best model for a distributed-systems design, but you pay frontier prices for both. Routers exist (Sakana's Fugu is the famous one), but they're closed, opaque about cost, region-locked, and limited to a fixed pool.
Maestro is just the routing brain. It doesn't host models or run a GPU. It sits in front of an AI gateway (OpenRouter, the Vercel AI Gateway, or your local Ollama/vLLM) and, for every request, decides which model, at what effort, in how many turns: it tries a cheap model first, verifies the answer, and escalates to a stronger one only when needed. Then it tells you exactly what it did.
// every response carries a `maestro` block: this is the whole point
"maestro": {
"route": [
{ "turn": 1, "model": "deepseek/deepseek-v4-pro", "verdict": "REVISE" },
{ "turn": 2, "model": "anthropic/claude-opus-4.8", "verdict": "ACCEPT" }
],
"classify": { "task": "code", "difficulty": 0.78, "confidence": 0.85 },
"cost_usd": 0.0182,
"cost_vs_frontier_only_usd": 0.0241,
"savings_pct": 24
}
30-second quickstart
Bring one provider key (OpenRouter unlocks the whole catalog), start the server, and point your OpenAI client at it:
export OPENROUTER_API_KEY=sk-or-... # one key, open + closed models (BYOK)
# or AI_GATEWAY_API_KEY=... # Vercel AI Gateway (also serves Fugu)
# or LOCAL_OPENAI_BASE_URL=http://localhost:11434/v1 # Ollama / vLLM / llama.cpp, fully local
npx openmaestro serve # or: docker run -p 8080:8080 ghcr.io/youruser/maestro
curl -s localhost:8080/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"maestro-auto","messages":[{"role":"user","content":"Translate good morning to Spanish"}]}' \
| jq .maestro
Your existing OpenAI code doesn't change, just the base URL:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
client.chat.completions.create(
model="maestro-auto", # or maestro-fugu, or a real model id (passthrough)
messages=[{"role": "user", "content": "Design a multi-region rate limiter."}],
)
Use it in your coding agent
Maestro speaks both the OpenAI and the Anthropic wire formats, so it drops into the popular agents. Your harness keeps its own tool loop; Maestro just picks the model behind it.
Claude Code (Maestro exposes /v1/messages):
ANTHROPIC_BASE_URL=http://localhost:8080 ANTHROPIC_API_KEY=unused claude
opencode / Cursor / Continue / any OpenAI-compatible tool: point the base URL at Maestro and use model maestro-auto:
// opencode.json
{ "provider": { "maestro": { "npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "http://localhost:8080/v1" },
"models": { "maestro-auto": {} } } } }
Tool calls pass straight through. Maestro is a transparent proxy (like OpenRouter): it decides which model handles each call and returns the model's output (text or tool_calls) verbatim. Your app executes the tools and calls Maestro again with the result. Full setup for opencode / Claude Code / Cursor / Continue: docs/INTEGRATIONS.md.
Swap in your own models
The model pool is 100% yours: it's a dated JSON registry of ModelSpecs. Copy maestro.config.example.json, edit it, and run with MAESTRO_REGISTRY=./maestro.config.json. Slot labels are abstract and remappable (the key lesson from Fugu): the router reasons over slots/tiers, and you map them to any model id on any provider with no retraining and no code change. Mix open, closed, and your local Ollama models freely. The defaults track the current frontier and are dated, so maestro registry check warns when they go stale.
Why Maestro
| Maestro | Sakana Fugu | Raw gateway (OpenRouter/Vercel) | |
|---|---|---|---|
| Open source / self-host | ✅ MIT | ❌ closed API | ⚠️ hosted |
| Model pool | any (open + closed, BYOK) | 3 closed models | any (but no routing) |
| Routing brain | ✅ cheap then verify then escalate | ✅ learned | ❌ you pick |
| Cost transparency | ✅ per-request, per-model | ❌ undisclosed | ⚠️ totals only |
| Refuses to overspend | ✅ confidence/verify gate | ⚠️ | ❌ |
| Run 100% locally | ✅ local-openai adapter | ❌ | ❌ |
| EU / region control | ✅ region filter | ❌ EU-blocked | ⚠️ |
| Drop-in OpenAI API | ✅ | ✅ | ✅ |
How it works
flowchart LR
A[OpenAI request<br/>model: maestro-auto] --> B[classify<br/>task / difficulty / caps]
B --> C[route<br/>filter then tier then guardrail score<br/>then escalation ladder]
C --> D[execute worker<br/>via gateway / local]
D --> E{verify}
E -- ACCEPT --> F[answer + maestro block]
E -- REVISE and budget left --> G[escalate up] --> D
C -. reads .-> R[(dated model registry<br/>slots to model ids)]
D -. logs .-> T[(trace + cost)]
- Classify: a fast, zero-cost heuristic tags the task (code/math/reasoning/...), estimates difficulty, and required capabilities.
- Route: filter the registry by capability/policy/region/budget, pick a starting tier by difficulty, rank with a guardrail score, build an escalation ladder of
(model, effort). - Execute: call the chosen model through the provider adapter (streaming supported).
- Verify: a verifier judges the answer.
ACCEPTstops;REVISEescalates to the next rung (bounded bymaxTurns). For code/tools, a real executable check is on the roadmap. - Report: the answer ships with the full route, per-model tokens, cost, and a "what frontier-only would have cost" comparison.
We make no dishonest claims. Maestro does not promise it always beats a single frontier model. It promises to try cheaper first, verify, escalate when needed, and show you the receipts.
Benchmarks
Reproduce locally with npm run eval (runs offline, deterministic, free; routes over the priced registry while executing on the mock provider, graded against ground-truth difficulty with oracle-route regret):
strategy pass% mean $ pass/$ regret $ fails
----------------------------------------------------------------------
maestro 92% 0.00053 1747.9 0.00035 2
best-single 100% 0.01507 66.3 0.01421 0
cheapest-single 56% 0.00016 3566.0 0.00000 11
random-route 88% 0.00689 127.7 0.00705 3
calibration (classifier confidence vs first-rung-correct):
Brier = 0.181 ECE = 0.110 (lower is better)
Read: Maestro reaches 92% of best-single quality at 97% lower mean cost (~26x more successful answers per dollar, 1748 vs 66), and beats a random router on both quality (+4 pts) and cost (13x cheaper). cheapest-single is cheaper per call but fails 44% of tasks; Maestro is the best balance. The gap to best-single and the 2 failures are real (the heuristic classifier sometimes mis-estimates); we show them on purpose. This is a routing benchmark on a realistic-mix fixture set, not a leaderboard claim.
Endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI-compatible (+ stream). model selects the mode. |
POST |
/v1/messages |
Anthropic-compatible (+ stream), drop-in for Claude Code. |
GET |
/v1/models |
maestro-auto/fugu/ultra + every registry model. |
POST |
/v1/route |
Dry-run the router: see the decision, no model call. |
GET |
/v1/traces/:id |
Inspect a past request's full trace. |
GET |
/healthz |
Liveness + which providers are configured. |
Modes (the model field): maestro-auto (classify + route + verify), maestro-fugu (single worker + verify, low-latency), maestro-ultra (multi-step decompose, roadmap, falls back to fugu), or any real model id for passthrough (no routing, always a safe drop-in).
Configuration
All via env (zero-config defaults shown):
| Variable | Default | Meaning |
|---|---|---|
MAESTRO_PORT |
8080 |
server port |
MAESTRO_DEFAULT_MODE |
fugu |
mode when model is maestro |
MAESTRO_MAX_TURNS |
3 |
max escalation turns |
MAESTRO_DIFFICULTY_LOW / _HIGH |
0.33 / 0.7 |
tier thresholds |
MAESTRO_VERIFY |
true |
enable the verify/escalate loop |
MAESTRO_REQUEST_TIMEOUT_MS |
120000 |
per-request upstream timeout (no hangs) |
MAESTRO_REGISTRY |
(built-in) | path to your own model registry JSON |
OPENROUTER_API_KEY / AI_GATEWAY_API_KEY / LOCAL_OPENAI_BASE_URL |
- | providers (BYOK) |
MAESTRO_FORCE_MOCK |
false |
demo: route priced registry, execute mock |
MAESTRO_TRACE_FILE |
- | append every trace as JSONL |
Per-request overrides via the maestro field: { "budget": 0.02, "maxTurns": 2, "verify": true, "region": "eu", "pin": "anthropic/claude-opus-4.8" }.
CLI
maestro serve [--port N] # start the OpenAI-compatible server
maestro route "<prompt>" # dry-run the router (no model call)
maestro registry check # report registry staleness
Roadmap
This is v0.1, a ~5-hour build. It works and is tested, but there is a lot of headroom. Contributions welcome.
Done (v0.1)
- OpenAI-compatible endpoint (
/v1/chat/completions,/v1/models,/v1/route,/v1/traces/:id,/healthz) - Anthropic-compatible
/v1/messages(works in Claude Code) - Two clean paths: concrete-model passthrough +
maestro-*routed - Classify, route (tier + guardrail + escalation ladder), verify/escalate loop
- Transparent tool-calling pass-through (agent loops keep their own loop)
- Full request pass-through (
response_format,provider,seed,session_id,metadata,trace,reasoning,plugins, ...) + upstream response preservation - OpenRouter / Vercel / local / mock adapters, BYO key, per-request timeouts
- Cost-aware routing + transparency block + offline reproducible eval (43 tests)
Next (v0.2, hardening)
- Incremental indexed streaming tool-call deltas (currently best-effort)
- Full Anthropic
tool_useblock mapping for/v1/messages - Retries + provider fallback on 429/5xx; circuit breaking
- Trace redaction (tool arguments / PII) + opt-in persistence (SQLite)
- Auth (API keys), rate limits, per-org budgets
- Latency: run the verifier async, or skip it when the classifier is highly confident
Later (v1, smarter)
- Executable verifier for code/tools (run the tests, do not ask an LLM)
- Semantic cache; prompt/version registry; a tiny trace-viewer UI
- Orchestration profiles (cheap / balanced / quality) + per-task best-model presets across ~40 models
- Large, long-running, tool-calling agent eval (success = task completion, not text match)
Research (v2-v3)
- Learned TRINITY-style router (frozen Qwen3-0.6B + tiny head, CPU-only, trained offline) as a drop-in for the heuristic classifier
-
maestro-ultra: multi-step decomposition (Conductor-style) behind a quality gate - Custom direct providers (Groq/Together/Fireworks/...) beyond the OpenRouter-first default
FAQ
Isn't this just a wrapper? The wrapper part (gateways, OpenAI serving) is deliberately not reinvented; that is solved. The value is the routing policy + verify/escalate loop + transparency + a reproducible eval. The roadmap's learned router is where it stops being "just a wrapper".
Does it really save money? On the bundled realistic-mix eval: ~97% cheaper than always-frontier (~26x better cost-per-success), and it beats random routing on both quality and cost. Your mileage depends on your traffic mix and registry, which is why the eval is in the repo and the cost is on every response.
Is it better than <top single model>? Maestro isn't a model, it's the router. It uses the best models (including frontier ones) and reaches frontier-level results at a fraction of the cost by sending only the hard requests to the expensive models. The honest claim is "frontier-quality routing, open-source, far cheaper". Reproduce it yourself with npm run eval (offline) or bash scripts/verify.sh with your own keys.
Can I run it fully offline? Yes. Set LOCAL_OPENAI_BASE_URL to your Ollama/vLLM/llama.cpp server. Maestro never hosts a model itself.
Acknowledgements
Maestro is an open-source implementation of the ideas in Sakana's Fugu, described in two papers included in papers/: TRINITY (arXiv 2512.04695, the route-then-verify loop) and Conductor (arXiv 2512.04388, multi-step decomposition). The cheap-then-escalate pattern descends from FrugalGPT; community reverse-engineering: OpenFugu. Full design notes and our analysis (with corrections against Sakana's official report) are in docs/.
Maestro is not affiliated with Sakana AI. "Fugu" is Sakana AI's product/research; Maestro contains no Sakana code or weights. It is an independent open-source build from the public papers.
License
MIT.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi