Awesome AI Gateway

A curated list of AI gateways & LLM proxies — LiteLLM, OpenRouter, Portkey, Kong, Higress, new-api and 50+ more — organized by what you actually need: lowest cost, compliance, self-hosting, smart routing, or China-ecosystem support. Star counts and releases refresh daily via CI.

Languages: English · 简体中文

An AI gateway sits between your code and LLM providers: one endpoint, one key, many models. It handles routing, failover, caching, rate limits, cost tracking and guardrails — so you change a base_url instead of rewriting your app for every provider.

📊 New: Evaluation Set →

Not just a link list — a sourced, reproducible evaluation layer: model benchmarks (AA Index, GPQA, SWE-bench, Arena), real-world token cost computed by a unit-tested script (writing a 100K-token report costs $0.03 on DeepSeek vs $3.01 on GPT-5.5 — a 106× spread), and a gateway scorecard rating compliance · price · security · stability. Read it →

📊 Evaluation set: model benchmarks · token cost · gateway scorecard
Which gateway should I use? (decision tree)
Quick comparison
💰 Cost-first: cheapest multi-model access
🔓 Self-hosted open source
🏢 Enterprise & compliance
☁️ First-party gateways (cloud & model vendors)
🇨🇳 China ecosystem
🧠 Smart routing & model selection
📊 Observability & cost tracking
🤖 MCP & agent gateways
☸️ Kubernetes-native & inference infra
📰 What's new
🚀 Recent releases (auto-updated)
How to choose safely
Contributing

Which gateway should I use?

Do you want to self-host?
│
├─ NO — hosted, minimal ops
│   ├─ Cheapest access to many models ──────────▶ OpenRouter · Vercel AI Gateway (0% markup)
│   ├─ Free control plane over your own keys ───▶ Cloudflare AI Gateway
│   ├─ EU data residency matters ───────────────▶ Requesty · Eden AI · nexos.ai
│   └─ Already on one cloud ────────────────────▶ AWS Bedrock · Azure APIM · Vertex AI
│
└─ YES — self-hosted / open source
    ├─ Python stack, broadest features ─────────▶ LiteLLM
    ├─ Raw performance (Go/Rust/TS) ────────────▶ Bifrost · TensorZero · Portkey Gateway
    ├─ Built-in evals + observability ──────────▶ TensorZero · Helicone
    ├─ Key distribution / billing / CN models ──▶ new-api · one-api · GPT-Load
    ├─ Enterprise K8s, audit, guardrails ───────▶ Kong · Higress · APISIX · Envoy AI Gateway
    └─ Governing AI agents & MCP traffic ───────▶ agentgateway · Lunar.dev

Quick start (drop-in)

The whole promise of a gateway: change base_url, keep your OpenAI code. Same request, now with routing, fallback, caching and cost tracking.

from openai import OpenAI

# Hosted example — OpenRouter (400+ models, one key):
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)

# Self-hosted example — a LiteLLM proxy you run:
client = OpenAI(
    base_url="http://localhost:4000",
    api_key="sk-litellm-...",
)

resp = client.chat.completions.create(
    model="anthropic/claude-fable-5",        # ask the gateway for any provider's model
    messages=[{"role": "user", "content": "Hello!"}],
)

Base URLs for the rest are on each project's docs (linked below). Most are OpenAI-compatible, so the only change is these two lines.

Quick comparison

Stars auto-refresh daily. ✅ built-in · ➕ via plugin/paid tier · ❌ not available.

Project	Type	Stars	License	Multi-provider	Fallback / LB	Caching	Guardrails	Cost tracking
LiteLLM	OSS proxy + SDK	⭐ 50k	MIT¹	✅ 100+	✅	✅	✅	✅
new-api	OSS relay/billing	⭐ 38.3k	AGPL-3.0	✅	✅	➕	➕	✅
one-api	OSS relay/billing	⭐ 34.8k	MIT	✅	✅	❌	❌	✅
Kong AI Gateway	OSS API gateway	⭐ 43.6k	Apache-2.0	✅	✅	✅ semantic	✅	✅
Apache APISIX	OSS API gateway	⭐ 16.7k	Apache-2.0	✅	✅	➕	➕	➕
Portkey Gateway	OSS gateway + SaaS	⭐ 12k	MIT	✅ 1600+	✅	✅	✅ 50+	➕ SaaS
TensorZero	OSS LLMOps stack	⭐ 11.5k	Apache-2.0	✅	✅	✅	➕	✅
Higress	OSS AI-native gateway	⭐ 8.6k	Apache-2.0	✅	✅	✅	✅	✅
GPT-Load	OSS key-pool proxy	⭐ 6.2k	MIT	✅	✅ key rotation	❌	❌	➕
Bifrost	OSS gateway (Go)	⭐ 5.7k	Apache-2.0	✅	✅ adaptive	✅	✅	✅
Helicone	OSS observability + gateway	⭐ 5.8k	Apache-2.0	✅	✅	✅	➕	✅
Envoy AI Gateway	OSS K8s gateway	⭐ 1.7k	Apache-2.0	✅	✅	➕	➕	✅
OpenRouter	SaaS marketplace	—	Commercial	✅ 400+	✅	✅	➕	✅
Vercel AI Gateway	SaaS (0% markup)	—	Commercial	✅ 100s	✅	❌	❌	✅
Cloudflare AI Gateway	SaaS control plane	—	Commercial (free tier)	✅	✅ dynamic	✅	✅	✅ budgets

¹ LiteLLM core is MIT; the repo contains a separately licensed enterprise directory.

💰 Cost-first: cheapest multi-model access

Pain point: "I want many models for the least money and zero ops."

OpenRouter — The dominant model marketplace: 400+ models behind one OpenAI-compatible API, pay-as-you-go with automatic failover; ~5.5% fee when buying credits. $113M Series B (May 2026), ~8M users.
Vercel AI Gateway — Hundreds of models at provider list price (0% markup), $5/month free credits, zero-data-retention option; pairs naturally with the AI SDK.
Cloudflare AI Gateway — Free control plane in front of your own provider keys: caching, dynamic routing, unified billing, and dollar-denominated spend limits (2026 beta).
Requesty — EU-friendly OpenRouter alternative: 400+ models, sub-20ms failover, ~5% markup.
Eden AI — Unified API for 500+ models plus vision/OCR/speech; EU-based, ~5.5% platform fee.
Helicone AI Gateway (cloud) — Passthrough billing at 0% markup with observability bundled.
GPT-Load ⭐ 6.2k — High-performance Go proxy that rotates pools of API keys across channels to maximize quota usage.

💡 Squeeze more from any gateway: enable semantic caching (Kong, Bifrost, Zuplo), set spend limits (Cloudflare, Zuplo, Pydantic/Logfire), and route easy prompts to cheap models (see Smart routing).

🔓 Self-hosted open source

Pain point: "My keys, my infra, no per-token middleman fee."

LiteLLM ⭐ 50k — The default choice: Python SDK + proxy server speaking OpenAI format to 100+ providers, with virtual keys, budgets, load balancing and guardrails.
Portkey Gateway ⭐ 12k — Fast TypeScript gateway (1,600+ models, 50+ guardrails) that also powers Portkey's commercial LLMOps platform.
TensorZero ⭐ 11.5k — Rust gateway unified with observability, evals, experimentation and optimization — built around a data/feedback flywheel.
Bifrost ⭐ 5.7k — Go gateway from Maxim AI claiming ~50x LiteLLM throughput; adaptive load balancing, cluster mode, MCP support.
Helicone ⭐ 5.8k — Observability-first platform (YC W23) with a Rust ai-gateway ⭐ 600.
Plano ⭐ 6.6k — AI-native proxy and data plane for agents (formerly Arch Gateway / archgw).
LLM Gateway ⭐ 1.3k — Open-source OpenRouter alternative: route, manage and analyze requests across providers.
APIPark ⭐ 1.8k — Cloud-native LLM API management and distribution platform.
Pydantic AI Gateway ⭐ 189 — BYOK gateway with cost caps and OTel, now folded into Pydantic Logfire.
OptiLLM ⭐ 4.1k — Optimizing inference proxy that boosts accuracy via test-time compute techniques.
aisuite ⭐ 13.8k — Andrew Ng's unified multi-provider client. A library rather than a deployable proxy — fits when you don't want network hops.
⚠️ Stale but historically notable: BricksLLM ⭐ 1.2k (PII masking, per-key limits; inactive since early 2025), Glide ⭐ 160 (inactive since 2024).

🏢 Enterprise & compliance

Pain point: "Audit logs, PII redaction, RBAC, on-prem, and the EU AI Act (enforceable Aug 2026)."

Kong AI Gateway ⭐ 43.6k — Mature API gateway with AI plugins: semantic caching/routing, prompt guard, token rate-limiting; Konnect for managed control plane.
Apache APISIX ⭐ 16.7k — Cloud-native API + AI gateway with ai-proxy / ai-proxy-multi plugins.
Envoy AI Gateway ⭐ 1.7k — CNCF-aligned GenAI access on Envoy Gateway, backed by Tetrate and Bloomberg.
kgateway ⭐ 5.6k — CNCF API/AI gateway, the base of Solo.io's commercial Gloo AI Gateway.
TrueFoundry AI Gateway — Enterprise gateway with routing, guardrails and RBAC, deployable into your K8s/VPC.
nexos.ai — Enterprise AI gateway/orchestration from the Nord Security founders (€30M Series A, Oct 2025).
Tyk AI Studio — AI governance suite: budgets, model catalogs, guardrails on Tyk's gateway.
Gravitee Agent Mesh — LLM Proxy, MCP Proxy and A2A support inside Gravitee APIM.
WSO2 AI Gateway — Egress management for LLM traffic: model routing, semantic caching, guardrails.
F5 AI Gateway — Containerized AI traffic gateway; data-leakage detection via the LeakSignal acquisition (Nov 2025).
IBM API Connect AI Gateway — Policy enforcement, masking and audit for LLM traffic.
MuleSoft AI / Omni Gateway — Governs LLM, MCP and agent traffic alongside classic APIs.
Lunar.dev ⭐ 453 — Egress consumption gateway repositioned around MCP/agent governance.

☁️ First-party gateways (cloud & model vendors)

Pain point: "We're already committed to one cloud — give us the native path."

AWS Bedrock — Multi-model access via the unified Converse API, cross-region inference, and AgentCore Gateway for tools/MCP.
Azure API Management — GenAI gateway — Token limits, semantic caching and load balancing in front of Azure OpenAI / AI Foundry.
Google Apigee + Vertex AI — LLM gateway patterns on Apigee with Vertex Model Garden as the managed hub.
Cloudflare AI Gateway — See Cost-first; the strongest free first-party option.
Vercel AI Gateway — GA, 0% markup, ZDR option; the default for Next.js/AI SDK shops.
Databricks Unity AI Gateway — Mosaic AI Gateway folded into Unity Catalog, adding agent + MCP governance.

🇨🇳 China ecosystem

Pain point: "Domestic models (Qwen/DeepSeek/GLM/Kimi), CNY payment, key distribution & billing for teams."

new-api ⭐ 38.3k — The most active one-api fork, now a "unified AI model hub": protocol conversion, billing, Rerank/Realtime endpoints. AGPL-3.0.
one-api ⭐ 34.8k — The original LLM API 管理&分发系统 (OpenAI/Azure/Claude/Gemini/DeepSeek/豆包…); development has slowed.
Higress ⭐ 8.6k — Alibaba's AI-native gateway on Envoy/Istio, first-class 通义/DeepSeek support; hosted version at higress.ai.
GPT-Load ⭐ 6.2k — 智能密钥轮询 multi-channel proxy in Go.
one-hub ⭐ 2.8k — one-api fork with better non-OpenAI function calling and stats.
simple-one-api ⭐ 2.3k — Single binary adapting 千帆/星火/混元/MiniMax/DeepSeek to the OpenAI interface.
Veloera ⭐ 1.6k — Newer relay platform in the one-api/new-api lineage.
uni-api ⭐ 1.2k — Lightweight single-config unified API manager, no frontend.
APIPark ⭐ 1.8k — China-origin, cloud-native AI & API gateway with an open developer portal.

⚠️ This list deliberately excludes reverse-engineered "free-api" relays (ToS violations, account risk). For commercial 中转站 price comparisons, see awesome-ai-api-proxy.

🧠 Smart routing & model selection

Pain point: "Send each prompt to the cheapest model that can handle it."

Not Diamond — SOTA model-routing intelligence; powers OpenRouter's Auto router.
Martian — Pioneer commercial model router; Accenture partnership.
RouteLLM ⭐ 5k — LMSYS's open router framework (research-grade; inactive since 2024 but still the canonical paper/code).
OpenRouter Auto — One model id (openrouter/auto) that routes per-prompt.
Unify — Early neural LLM router (company since pivoted to agents).
Bifrost adaptive load balancing / Cloudflare dynamic routing — routing built into gateways themselves.

📊 Observability & cost tracking

Pain point: "Who spent what, on which model, and why did quality drop?"

Helicone ⭐ 5.8k — Logs, costs, sessions, prompt experiments; one-line proxy integration.
TensorZero ⭐ 11.5k — Gateway + observability + evals in one Rust binary, data stays in your ClickHouse.
Portkey — Full LLMOps suite over its OSS gateway: traces, budgets, prompt management.
vLLora (ex-LangDB) ⭐ 802 — Agent debugging and observability from the LangDB team.
Braintrust Proxy ⭐ 398 — Caching proxy wired into Braintrust evals.
MLflow AI Gateway ⭐ 26.4k — Unified endpoints + governance inside the MLflow platform.

🤖 MCP & agent gateways

Pain point: "Agents call tools now — govern MCP traffic like you govern APIs." The newest category (2025–2026).

agentgateway ⭐ 3.2k — CNCF proxy for agentic traffic: MCP governance and agent-to-agent (A2A) communication.
Lunar.dev MCPX ⭐ 453 — Gateway for managing MCP server consumption.
Tetrate Agent Router Service — Managed Envoy AI Gateway fleet: LLM + MCP gateway with guardrails (~5% fee).
Zuplo AI Gateway — Programmable policies: USD spend limits, prompt-injection detection, secret masking, MCP support.
NetFoundry MCP/LLM Gateways — Zero-trust gateways for AI deployments (launched June 2026).
AWS AgentCore Gateway — Tool/MCP gateway inside Bedrock AgentCore.

☸️ Kubernetes-native & inference infra

Pain point: "Routing to self-hosted models (vLLM/Ollama) inside the cluster, GPU-aware."

Gateway API Inference Extension ⭐ 690 — The Kubernetes standard for inference-aware routing.
AIBrix ⭐ 4.9k — Cost-efficient control plane for vLLM on K8s (ByteDance-origin).
llm-d ⭐ 3.3k — K8s-native distributed inference serving (Red Hat/Google/IBM-backed).
Higress ⭐ 8.6k / Kong ⭐ 43.6k / Envoy AI Gateway ⭐ 1.7k — all implement inference-extension-style routing.
Traefik Hub AI Gateway — LLM routing/security in Traefik's commercial runtime.
Inference Gateway ⭐ 124 — Small cloud-native gateway unifying cloud + local (Ollama) providers.

📰 What's new

Curated monthly. Last review: 2026-06-11.

2026-05 · OpenRouter raised a $113M Series B led by CapitalG at a $1.3B valuation — ~8M users, ~100T tokens/month. (TechCrunch)
2026-06 · NetFoundry launched zero-trust MCP and LLM gateways; Cisco Investments joined its Series A. (PR Newswire)
2026 · Cloudflare AI Gateway shipped dollar-denominated spend limits (public beta) on top of dynamic routing and unified billing. (Cloudflare blog)
2025-11 · Pydantic AI Gateway went open beta and has since merged into Logfire; F5 added data-leakage detection to its AI Gateway via the LeakSignal acquisition. (Pydantic Logfire, F5)
Trend · MCP gateways emerged as a distinct category; spend-limit enforcement became table stakes; the EU AI Act (enforceable Aug 2026) is driving the compliance bucket; new-api overtook one-api as the most active China-ecosystem relay.

🚀 Recent releases (auto-updated)

2026-06-11 · yym68686/uni-api v1.7.114 — Release 1.7.114
2026-06-11 · andrewyng/aisuite app-v0.1.0 — OpenCoworker 0.1.0
2026-06-09 · katanemo/plano 0.4.24 — 0.4.24
2026-06-09 · BerriAI/litellm v1.88.1 — v1.88.1
2026-06-08 · maximhq/bifrost ent-v1.4.8-base — Enterprise v1.4.8 base
2026-06-06 · envoyproxy/ai-gateway v0.7.0 — v0.7.0
2026-06-04 · kgateway-dev/kgateway v2.3.2 — v2.3.2
2026-06-04 · tensorzero/tensorzero 2026.6.0 — 2026.6.0
2026-06-04 · Kong/kong 3.9.2 — 3.9.2
2026-06-01 · mlflow/mlflow v3.13.0 — v3.13.0
2026-05-29 · tbphp/gpt-load v1.4.8 — v1.4.8
2026-05-26 · QuantumNous/new-api v1.0.0-rc.10 — v1.0.0-rc.10

How to choose safely

Check the markup. Marketplaces charge 0–6% — for high volume, self-hosting or 0%-markup gateways (Vercel, Helicone cloud) pay for themselves fast.
Verify model fidelity. Some relays silently downgrade models. Send a canary prompt (e.g. a known hard reasoning question) through the gateway and direct to the provider, then diff.
Mind data flow. Every gateway sees your prompts. For sensitive data: self-host, or require ZDR (zero data retention) in writing.
License check before embedding. new-api is AGPL-3.0; LiteLLM has an enterprise-licensed directory; "open core" ≠ everything free.
Project health. Star count ≠ maintenance. Check last release date — several once-popular gateways (BricksLLM, Glide, RouteLLM) are effectively unmaintained; this list labels them.
Avoid gray-market relays reselling reverse-engineered or stolen-quota access — account bans and data leaks are your risk, not theirs.

FAQ

What is an AI gateway (LLM gateway)?
A proxy between your code and LLM providers: one OpenAI-compatible endpoint and key for many models, adding routing, failover, caching, rate limits, cost tracking and guardrails. See the intro.

AI gateway vs LLM router — what's the difference?
A router decides which model gets each request (e.g. cheap vs flagship); a gateway is the full proxy layer (auth, caching, observability, guardrails) that usually includes routing. See smart routing.

What's the best open-source AI gateway?
LiteLLM is the default for breadth (Python, 100+ providers). For raw performance pick Bifrost (Go) or TensorZero (Rust); for enterprise K8s pick Kong or Higress. Full list under self-hosted.

LiteLLM vs OpenRouter — which should I use?
OpenRouter is hosted (zero ops, ~5.5% fee, 400+ models); LiteLLM is self-hosted (your keys, your infra, $0 markup). Hosted to start, self-host when volume justifies it. Cost math in the evaluation set.

What's the cheapest way to call many LLMs?
For zero ops: Vercel AI Gateway or Cloudflare AI Gateway (0% markup). For lowest token cost, route bulk work to cheap models — a 100K-token report runs $0.03 on DeepSeek vs $3.01 on GPT-5.5. See cost-first.

Are AI gateways safe? Who sees my prompts?
Every gateway sees your prompts. For sensitive data self-host or require zero-data-retention in writing; check the gateway scorecard for compliance/security ratings and known CVEs.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first. Inclusion criteria, in short: the project must be an actual gateway/proxy/router for LLM or agent traffic (not an SDK wrapper or chat UI), publicly available, and active within the last 12 months — or clearly labeled as stale.

Star history

License

To the extent possible under law, the contributors have waived all copyright and related rights to this work.

awesome-ai-gateway