freellmpool
Health Warn
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
Pool the free tiers of 16 LLM providers (Groq, Cerebras, NVIDIA NIM, Gemini, OpenRouter, Cloudflare…) behind one OpenAI-compatible /v1 endpoint. Free, zero-config, automatic failover. Codex/agent ready.
freellmpool
Pool the free tiers of 16 LLM providers (200+ live-validated models) behind one
OpenAI-compatible endpoint — as a CLI, a Python library, or a local proxy.
Works with no API keys.
Groq, Cerebras, NVIDIA NIM, Google Gemini, OpenRouter, GitHub Models, Cloudflare,
Mistral, Cohere and others each give away a free tier — but each has its own SDK,
rate limits, and daily cap. freellmpool puts them in one pool: it sends each
request to a provider you have access to, fails over to the next when one is rate
limited or down, and tracks per-day usage so you get the most out of every tier.
Two providers (Pollinations and OVHcloud) need no API key, so a fresh install
answers immediately:
$ pip install freellmpool
$ freellmpool ask "Explain the CAP theorem in one sentence."
A distributed system can guarantee at most two of consistency, availability, and
partition tolerance at the same time.
Add keys for the other providers to unlock more models and higher limits.
Run a coding agent on free models
freellmpool's proxy speaks both the OpenAI and the Anthropic API, so coding agents
run against pooled free tiers with no code changes — just point them at the proxy:
freellmpool proxy # starts http://localhost:8080
freellmpool code claude # prints the one-line setup for Claude Code
# (also: codex, aider, cline, continue, cursor, opencode)
Claude Code gateway mode can also be launched directly:
ANTHROPIC_BASE_URL=http://localhost:8080 \
ANTHROPIC_AUTH_TOKEN=dummy \
ANTHROPIC_MODEL=auto \
ANTHROPIC_SMALL_FAST_MODEL=auto \
CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 \
claude
Your existing OpenAI/Anthropic apps work the same way — set OPENAI_BASE_URL (orANTHROPIC_BASE_URL) to the proxy and keep your code unchanged.
New in 0.11: capacity tools — freellmpool capacity status shows which free
tiers are usable right now, freellmpool providers health live-probes them, andfreellmpool keys add walks you through configuring more (see
Capacity & provider health and
docs/CAPACITY.md).
New in 0.10: an async API (AsyncPool), an MCP server (freellmpool mcp),
latency-aware routing with freellmpool benchmark, observability hooks, and a
plugin system for custom providers. See the changelog.
Install
pip install freellmpool # or: pipx install freellmpool
Only dependency is httpx. Python 3.11+.
Command line
freellmpool ask "Write a haiku about sqlite"
git diff | freellmpool ask "Write a commit message for this"
freellmpool providers # which providers are configured
freellmpool models # every provider/model id
Pin a provider or model; common OpenAI/Anthropic model names are mapped to a free
equivalent so existing scripts keep working:
freellmpool ask -m groq/llama-3.3-70b-versatile "hi"
freellmpool ask -p cerebras,groq "hi"
freellmpool ask -m gpt-4o-mini "hi" # routed to a free model
As a proxy
Run a local server that speaks the OpenAI API, then point any OpenAI-compatible
tool at it:
freellmpool proxy
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=unused
from openai import OpenAI
client = OpenAI()
print(client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "hi"}],
).choices[0].message.content)
The proxy also implements the OpenAI Responses API (for the Codex CLI) and the
Anthropic Messages API (for Claude Code), so coding agents can run on free models
too. freellmpool code <agent> prints the exact setup:
freellmpool code aider # also: claude, codex, cline, continue, cursor, opencode
Endpoints: /v1/chat/completions (token streaming, tool calling), /v1/embeddings,/v1/responses, /v1/messages, /v1/models, and a /dashboard page showing usage.
Setup snippets for specific tools are in docs/INTEGRATIONS.md
and docs/AGENTS.md.
As a library
from freellmpool import Pool
pool = Pool.from_default_config()
reply = pool.ask("Summarize the plot of Hamlet in 20 words.")
print(reply.text, "—", reply.provider_id)
vectors = pool.embed(["first document", "second document"]).vectors
Async is the same API with await:
from freellmpool import AsyncPool
async with AsyncPool.from_default_config() as pool:
reply = await pool.aask("Summarize the plot of Hamlet in 20 words.")
Pass on_event=... to either pool to receive structured routing events
(attempt/success/error/cooldown/exhausted) for logging or tracing. Add
your own endpoint with register_provider(...), or a new request shape withregister_adapter(name, fn).
Benchmark your providers
freellmpool benchmark times one call per configured provider and prints
latency and success, so you can see which of your free tiers are fastest right
now. The router learns the same latency/success signal from real traffic as it
runs; set FREELLMPOOL_ROUTING=fast to prefer the lowest-latency provider
instead of the default least-used-first.
$ freellmpool benchmark
provider/model status latency note
cerebras/llama-3.3-70b ok 180 ms 6 tok
groq/llama-3.3-70b ok 240 ms 6 tok
ovh/Meta-Llama-3_3-70B FAIL - HTTP 429
Capacity & provider health
Free tiers drift through the day — keys expire, providers go down, daily caps
fill. These commands tell you what's usable right now and what to set up next:
freellmpool capacity status --target 5 # who's healthy / near quota / missing a key
freellmpool providers health # send one tiny request to each, time it
freellmpool keys checklist --target 5 # which keys to add to reach N healthy providers
freellmpool keys add groq # configure a key (and record metadata)
capacity status is local-first: it reads your catalog, environment, and
per-day quota counters and labels each provider healthy, low_quota,exhausted, invalid_key, or missing. It also syncs an advisory external
catalog (mnfst/awesome-free-llm-apis)
to suggest free providers you could add — advisory only; your providers.toml
stays the source of truth for routing. keys add <name> can even import a
suggested provider from that catalog or create an OpenAI-compatible stub and
autodiscover its models. The proxy /dashboard shows the same capacity at a
glance. Full reference: docs/CAPACITY.md.
As an MCP server
freellmpool mcp runs a Model Context Protocol server over stdio, so Claude
Desktop, Claude Code, or Cursor can hand subtasks to free models. See
docs/MCP.md. A server.json is included for the
MCP registry.
In Simon Willison's llm CLI
There's a plugin: llm install llm-freellmpool → llm -m freellmpool "..." with
no API key. Source: 0xzr/llm-freellmpool.
Provider keys
freellmpool reads keys from the environment and uses whatever is set. None are
required. Step-by-step signup links for each (all free, no card) are in
docs/ACCOUNTS.md.
| Provider | Env var | Notes |
|---|---|---|
| Pollinations | — | no key needed |
| OVHcloud | — | no key needed (anonymous tier) |
| LLM7 | LLM7_API_KEY |
optional |
| Groq | GROQ_API_KEY |
fast |
| Cerebras | CEREBRAS_API_KEY |
fast, large daily cap |
| NVIDIA NIM | NVIDIA_API_KEY |
|
| OpenRouter | OPENROUTER_API_KEY |
free models |
| Google Gemini | GEMINI_API_KEY |
|
| GitHub Models | GITHUB_TOKEN |
any PAT |
| Cloudflare | CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID |
|
| Mistral, Cohere, SambaNova, Z.ai, Ollama Cloud, LongCat | see .env.example |
A config.toml (see config.toml.example) can hold keys,
model aliases, and settings instead of env vars.
How routing works
For each request, freellmpool builds the list of (provider, model) pairs you
have access to, then orders providers least-used-first and picks a least-used
model inside that provider. This keeps providers with large catalogs, like
NVIDIA, from receiving more traffic only because they expose more models. A
provider that returns a 429 is set aside for a cooldown window. Daily counts are
kept in ~/.config/freellmpool/quota.json and reset at UTC midnight.
Every call records latency and success per model target. A provider whose targets
are currently failing sinks to the back automatically; withFREELLMPOOL_ROUTING=fast the fastest measured provider goes first instead.freellmpool benchmark warms these metrics on demand. To restore the old
per-model balancing behavior, set FREELLMPOOL_ROUTING=legacy orFREELLMPOOL_ROUTING=model (or FREELLMPOOL_ROUTING=model-fast for the old
per-model fastest-first ordering).
Quality routing (FREELLMPOOL_ROUTING=quality). Free tiers' strongest models
have the smallest daily caps, so a naive pool gets weaker as the day fills. Quality
routing matches each prompt's difficulty to each model's capability: hard
prompts (long input, code, reasoning cues) go to the strongest available model, and
easy ones go to lightweight models — which rations scarce strong-model quota so the
pool stays sharp for longer. Capability is grounded in real benchmark data, not
guessed from names; models no benchmark lists fall back to a name heuristic.
The bundled, offline scores come from LMArena Elo (an
MIT-licensed snapshot) and the Aider code-editing
leaderboard (Apache-2.0), normalized to a common percentile scale. For much
broader coverage, run freellmpool capability sync with a free
Artificial Analysis API key
(FREELLMPOOL_AA_API_KEY) — its Intelligence Index covers most current and
open-weight models and takes precedence. The fetched AA data is cached locally
under your own key (never bundled, per AA's terms). freellmpool capability status shows current coverage. Scores via LMArena and Aider; intelligence index
via Artificial Analysis when keyed.
Context windows. Free models often have small context windows. freellmpool
never truncates your input; instead, when a model rejects a request as too long,
it learns that model's limit and stops routing oversized requests there, escalating
only to larger-window models. If nothing fits it raises a clearContextWindowExceeded (with the estimated input size) instead of a generic
failure — over the proxy that's a 413. You can declare a model's window withcontext = N in providers.toml to skip it proactively.
Architecture notes: docs/ARCHITECTURE.md.
Limitations
- Free-tier models are smaller than frontier models. They're good for drafting,
summarizing, classification, triage, and everyday coding — not a replacement
for GPT-class reasoning on hard problems. - Quality and capacity vary through the day as high-cap tiers exhaust; limits
reset at UTC midnight. - Free tiers change without notice. When a model id or limit goes stale, a
one-line PR toproviders.tomlfixes it for everyone. - The proxy is meant for local/single-user use. It binds to
127.0.0.1by
default; if you expose it, set a key (--api-key). - The Claude Code / Anthropic path is experimental (text and tool use; no vision).
- These are free tiers shared by everyone — don't abuse them.
How it compares
| Tool | What it is | Install | Keyless start | CLI / library / proxy / MCP |
|---|---|---|---|---|
| freellmpool | Pools many providers' free tiers | pip install |
Yes (2 providers) | All four |
| OpenRouter | Hosted paid aggregator (some free models) | API key | No | API only |
| LiteLLM | Multi-provider SDK/proxy (bring your own keys) | pip install |
No | Library + proxy |
| Self-hosted free-API servers | A server you deploy | Docker + config | No | Server only |
freellmpool's niche is the keyless, pip-installable client for squeezing the
hosted free tiers — not a server you deploy, and not a paid aggregator.
FAQ
Is there a free, OpenAI-compatible LLM API gateway? Yes — freellmpool is a free,
MIT-licensed gateway that exposes one OpenAI-compatible endpoint backed by the free
tiers of 16 providers. pip install freellmpool and point any OpenAI client at the
local proxy.
How do I use multiple free LLM APIs at once? freellmpool pools them: each request
goes to a provider you have access to, fails over to the next when one is rate-limited
or down, and tracks per-day usage so load spreads across tiers.
Can I run Claude Code or Codex on free models? Yes — the proxy speaks both the
OpenAI and Anthropic APIs. Set OPENAI_BASE_URL or ANTHROPIC_BASE_URL to the proxy
and run Codex, Claude Code, aider, Cline, Continue, or Cursor unchanged. For Claude
Code, set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 so /v1/models is discovered
through the Anthropic bridge. See freellmpool code <agent>. (Claude Code path is
experimental: text + tools, no vision.)
Do I need an API key? No — Pollinations and OVHcloud work with no key, so a fresh
install answers immediately. Add free keys for the other providers for more models and
higher limits.
Is it free and open source? Yes, MIT-licensed. More at the
project page.
Contributing
New providers and fixes to stale limits are the most useful contributions, and
both are usually a small change to providers.toml. See
CONTRIBUTING.md. Tests run with no network access:
pip install -e ".[dev]" && pytest && ruff check src tests
License
MIT
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found