Name: router
Author: workweave

One endpoint. Every model. Always the right one.

A drop-in proxy for Anthropic, OpenAI, and Gemini that picks the best model
for every request: using a tiny on-box embedder, not a vibes-based prompt.

Built by Weave: The #1 engineering intelligence platform,
loved by Robinhood, PostHog, Reducto, and hundreds of others.

What it does

Point Claude Code, Cursor, or your own app at localhost:8080. The router:

🎯 Routes per request. A cluster scorer derived from
Avengers-Pro [^1] picks the right
model from your enabled providers, every turn.
🔌 Speaks everyone's API. Anthropic Messages, OpenAI Chat Completions,
Gemini native. Streaming, tools, vision, the works.
🧠 Knows OSS too. DeepSeek, Kimi, GLM, Qwen, Llama, Mistral via
OpenRouter (or any OpenAI-compatible endpoint).
🔒 BYOK by default. Provider keys stay on your box, encrypted at rest.
📊 Observable. OTLP traces out of the box. See your dashboard in the Weave dashboard (http://localhost:8080/ui/dashboard) or drop in Honeycomb, Datadog,
Grafana, whatever.

30-second quickstart

The fastest way: point Claude Code at the hosted Weave Router with one
command. No clone, no Docker, no Postgres.

npx @workweave/router

That's it. The installer walks you through scope (user vs. project), grabs
a router key, and wires Claude Code. Other flavors:

npx @workweave/router --scope project       # per-repo, commits settings.json
npx @workweave/router --local               # self-hosted localhost:8080
npx @workweave/router --base-url https://router.acme.internal
npx @workweave/[email protected]                 # pin a version

Requires Node ≥ 18 and jq. Full flag reference: install/npm/README.md.

Or: self-host the whole stack

If you want the router (and dashboard) running on your own box:

# 1. Drop a provider key in. OpenRouter is the recommended baseline.
echo "OPENROUTER_API_KEY=sk-or-v1-..." >> .env.local

# 2. Boot Postgres + router on :8080 and seed an rk_ key.
make full-setup

The router is up at http://localhost:8080, the dashboard at
http://localhost:8080/ui/ (password: admin), and your rk_... key
prints in the logs.

# Call it like Anthropic
curl -sS http://localhost:8080/v1/messages \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,
       "messages":[{"role":"user","content":"hi"}]}'

# ...or like OpenAI
curl -sS http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"gpt-4o-mini",
       "messages":[{"role":"user","content":"hi"}]}'

# Peek at the routing decision without proxying
curl -sS http://localhost:8080/v1/route -H "Authorization: Bearer rk_..." -d '...'

Wire it into your tools

Claude Code. Run make install-cc to wire Claude Code at the local
self-hosted router (it's also invoked automatically at the end of
make full-setup). For the hosted router, use npx @workweave/router
above.

Cursor (early beta, performance may not be the best). Settings →
Models → Override OpenAI Base URL → http://localhost:8080/v1, paste
rk_... as the API key.

Two keys, don't mix them up:

sk-or-... / sk-ant-... / sk-... = your upstream provider key. Lives in .env.local.

rk_... = your router key. Clients send this as a Bearer token.

Endpoints

Endpoint	Format
`POST /v1/messages`	Anthropic Messages, routed
`POST /v1/chat/completions`	OpenAI Chat Completions, routed
`POST /v1beta/models/:action`	Gemini `generateContent`, routed
`POST /v1/route`	Returns the decision, no upstream call
`GET /v1/models` · `POST /v1/messages/count_tokens`	Anthropic passthrough
`GET /health` · `GET /validate`	liveness + key check

Deeper docs

📐 Configuration reference: every env var,
BYOK encryption, OTel knobs, cluster routing.
🛠️ Contributing: layering rules, hot-reload dev,
migrations, tests, the whole engineering loop.
🏗️ Architecture: package layout, import contracts,
recipes for adding endpoints / providers / strategies.

Roadmap

Token-aware rate limiting (Redis sliding window per installation)
Sub-installations for tenant hierarchies
Speculative dispatch + hedging for tail latency

[^1]: Zhang, Y. et al. Beyond GPT-5: Making LLMs Cheaper and Better via
Performance–Efficiency Optimized Routing (Avengers-Pro).
arXiv:2508.12631, 2025. https://arxiv.org/abs/2508.12631