⚡ Alvus

~5 MB binary. Zero dependencies. Zero 429s.
A lightweight Go proxy that silently absorbs rate limit errors and keeps your AI agent running.

The Problem

You're in the middle of an agentic session — OpenClaw is halfway through a task, Cline is on a roll, your agent is doing things — and then:

Error: 429 Too Many Requests

The loop breaks. Context is lost. You're staring at a spinner.

If you use free-tier providers like NVIDIA NIM, this happens constantly. Free keys cap around 40 RPM. One productive session burns through that in seconds.

The Solution

Alvus sits between your agent and the upstream API. You give it a pool of keys. It handles everything else — round-robin distribution, per-key cooldowns, automatic retries, streaming passthrough. Your agent never sees a 429.

Any OpenAI-compatible agent or IDE
              │
              ▼
   ┌─────────────────────┐
   │        Alvus        │  ← localhost:3000
   │                     │
   │  [key1] ✅ ready    │
   │  [key2] ✅ ready    │  ──→  NVIDIA NIM / any OpenAI-compatible API
   │  [key3] ❄️ cooling  │
   └─────────────────────┘

3 keys × 40 RPM = 120+ effective RPM. The math is simple. The setup is simpler.

Idle RAM usage: ~2 MB. Alvus is a single static binary with no runtime. It won't compete with your models for memory.

Works With Everything

If it speaks OpenAI-compatible API, it works with Alvus.

Tool	Type	Setup
OpenClaw	AI agent	Set base URL in provider config
PicoClaw	Lightweight agent	Set `api_base` in config.json
Nanobot	Lightweight agent	Set `api_base` in config.yaml
Cline	VS Code agent	OpenAI Compatible provider
Cursor	IDE	Base URL override in settings
Aider	CLI agent	`--openai-api-base` flag
Any OpenAI-compatible client	—	Point at `http://localhost:3000/v1`

Features


🔑 Key pool	Multiple keys, one endpoint. Distribute load transparently
🔄 Round-robin	Even distribution across all healthy keys
🚫 Silent retry on 429/502/503	Failed key enters cooldown, request retries instantly with the next
⏱️ Retry-After support	Respects upstream `Retry-After` headers — no blind fixed waits
🔑 Auto-disable on 401/403	Invalid or revoked keys are permanently removed from the pool
📡 Streaming passthrough	SSE and chunked responses piped with zero buffering overhead
❤️ Health endpoint	`GET /health` shows live key status, cooldown timers, and requests/minute
🖥️ Interactive Dashboard	`GET /dashboard` — Premium Glassmorphism Dark UI for real-time monitoring
⚡ Live Activity Logs	Searchable, 1000-entry memory cache to track all request activity
🔧 Dynamic Configuration	Update keys and base URLs directly from the dashboard; writes to `.env`
🪶 Zero dependencies	Pure Go stdlib. One file. One binary
🔧 `.env` support	Built-in parser — no `godotenv`, no extras
🖥️ Runs anywhere	linux/amd64, arm64, arm, 386 — including Pi Zero and older x86 hardware
💾 ~2 MB idle RAM	Static binary, no runtime, won't compete with your models for memory

Quickstart

1. Get the binary

Build from source (requires Go 1.21+):

git clone https://github.com/YOUR_USERNAME/alvus.git
cd alvus
go build -o alvus main.go

Cross-compile for a remote server (e.g. Raspberry Pi Zero, 32-bit x86):

# Pi Zero / older ARM
GOOS=linux GOARCH=arm CGO_ENABLED=0 go build -o alvus main.go

# 32-bit x86 (Atom, old netbooks, salvaged hardware)
GOOS=linux GOARCH=386 CGO_ENABLED=0 go build -o alvus main.go

The binary is fully static — drop it on the machine and run it. No runtime, no dependencies, no install step.

Download a prebuilt release:

Go to Releases and grab the binary for your platform.

2. Configure

Create .env in the same directory as the binary:

# Your API keys, comma-separated
API_KEYS=nvapi-xxxxxxxxxxxx,nvapi-yyyyyyyyyyyy,nvapi-zzzzzzzzzzzz

# Port to listen on (default: 3000)
PORT=3000

# Upstream API base URL (default: NVIDIA NIM)
TARGET_BASE_URL=https://integrate.api.nvidia.com/v1

# Seconds to cool down a key after a 429, 502, or 503 (default: 60)
COOLDOWN_SEC=60

Real environment variables take precedence over .env — useful for systemd or containers.

3. Run

./alvus

⚡ Alvus started on :3000
   Target  : https://integrate.api.nvidia.com/v1
   Keys    : 3 loaded
   Cooldown: 60s per key on 429/502/503

4. Point your agent at it

OpenClaw

{
  "models": {
    "providers": {
      "nim": {
        "baseUrl": "http://localhost:3000/v1",
        "apiKey": "sk-proxy-dummy"
      }
    },
    "defaults": {
      "provider": "nim",
      "model": "deepseek-ai/deepseek-r1"
    }
  }
}

PicoClaw / Nanobot

{
  "model_name": "deepseek-r1",
  "model": "openai/deepseek-ai/deepseek-r1",
  "api_base": "http://localhost:3000/v1",
  "api_keys": ["sk-proxy-dummy"]
}

Cline (VS Code)

Setting	Value
API Provider	`OpenAI Compatible`
Base URL	`http://localhost:3000/v1`
API Key	`sk-proxy-dummy` (any string)
Model ID	`deepseek-ai/deepseek-r1`

Cursor

Settings → Models → set base URL to http://localhost:3000/v1, any dummy key.

Aider

aider --openai-api-base http://localhost:3000/v1 --openai-api-key sk-dummy

How It Works

1. Request arrives from your agent or IDE
2. Body is buffered (needed for retry replay)
3. Round-robin picks the next available key
4. Request forwarded upstream with that key injected
   │
   ├── ✅ 2xx/3xx → request count incremented, headers + body streamed back, done
   ├── ❄️ 429/502/503 → key enters cooldown, retry with next key
   ├── 🔑 401/403 → key permanently removed from pool
   └── ⚠️ other 4xx/5xx → passed through as-is

Your agent sees a clean stream or a final error. Never a 429.

Key Status

curl http://localhost:3000/health

{
  "status": "ok",
  "keys": 3,
  "details": [
    {
      "index": 0,
      "key": "nvapi-xxxxxxxxxxxx",
      "status": "ready",
      "requests_per_minute": 15,
      "last_used": "2023-11-15T14:30:00Z",
      "cooldown_until": "2023-11-15T14:29:00Z"
    },
    {
      "index": 1,
      "key": "nvapi-yyyyyyyyyyyy",
      "status": "cooling(42s)",
      "requests_per_minute": 40,
      "last_used": "2023-11-15T14:31:00Z",
      "cooldown_until": "2023-11-15T14:32:00Z"
    }
  ]
}

Other Providers

TARGET_BASE_URL is all you need to change:

# OpenRouter
TARGET_BASE_URL=https://openrouter.ai/api/v1

# Together AI
TARGET_BASE_URL=https://api.together.xyz/v1

# Groq
TARGET_BASE_URL=https://api.groq.com/openai/v1

# Any other OpenAI-compatible endpoint
TARGET_BASE_URL=https://your-provider.com/v1

Running as a Service (systemd)

[Unit]
Description=Alvus
After=network.target

[Service]
ExecStart=/usr/local/bin/alvus
WorkingDirectory=/etc/alvus
Restart=on-failure
RestartSec=5
# Graceful shutdown on stop/restart
KillSignal=SIGTERM
TimeoutStopSec=10

[Install]
WantedBy=multi-user.target

Put your .env in /etc/alvus/. Reload and start:

sudo systemctl daemon-reload
sudo systemctl enable --now alvus

Alvus handles SIGINT and SIGTERM gracefully, allowing in-flight requests to complete before shutting down (with a 5-second timeout).

FAQ

Do I need Go installed to run this?
No. Download a prebuilt binary from Releases.

Are my keys safe?
Keys live in .env on your machine and are only ever sent to the upstream provider. Alvus logs key indices, never key values.

What if ALL keys are cooling?
Alvus waits for the soonest key to become available and retries, up to 10 times. If everything stays exhausted, it returns 503. In practice, with 3 keys and a 60s window this is very hard to trigger.

Can I reload keys without restarting?
Yes! Alvus now supports hot-reloading when the .env file changes. Simply edit your .env file and Alvus will automatically detect the changes and reload the configuration within 1 second. No restart needed.

Does it work on a Raspberry Pi Zero / 32-bit hardware?
Yes. Prebuilt binaries include linux/arm and linux/386. The binary is fully static — no runtime needed.

How much memory does it use?
Around 2 MB at idle. It's a single static Go binary with no runtime overhead — you won't notice it sitting next to a running model.

Roadmap

Hot-reload when .env changes (no restart needed)
Per-key request counters and detailed status in /health
Web dashboard (opt-in, zero-dep binary stays the same)

Contributing

PRs welcome. This project lives in a single file with zero external dependencies — keep it that way. If a feature needs an import beyond stdlib, it doesn't belong in main.go. Open an issue first and we'll figure out the right shape for it.

License

MIT.

Built at 2am when an OpenClaw task hit its fifth 429 in a row.