waifu-code
Health Warn
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 9 GitHub stars
Code Warn
- process.env — Environment variable access in src/cli.ts
- process.env — Environment variable access in src/config.ts
- process.env — Environment variable access in src/providers/base.ts
- network request — Outbound network request in src/providers/base.ts
Permissions Pass
- Permissions — No dangerous permissions requested
This tool acts as a local proxy that intercepts API requests from the official Claude Code CLI and transparently routes them through alternative AI providers (like NVIDIA NIM, OpenRouter, or Ollama), allowing users to access different models or free tiers.
Security Assessment
The overall risk is rated as Medium. The tool requires you to pass or save API keys, which it stores locally in `~/.waifu/config.json` and accesses via environment variables. It does not request explicitly dangerous system permissions and no hardcoded secrets were found. However, it functions by making outbound network requests to third-party AI provider APIs. While this is its intended purpose, users must be comfortable trusting a small, unaudited proxy with their API keys and prompt data.
Quality Assessment
The project is actively maintained, with its most recent push occurring today. However, it suffers from low community visibility, having only 9 GitHub stars. Additionally, the repository completely lacks a license file. This means that, strictly speaking, all rights are reserved by the creator, and the legal terms under which you can use, modify, or distribute this open-source code are undefined.
Verdict
Use with caution — it is actively maintained, but the lack of a license and low community adoption means you should inspect the code yourself before routing sensitive API keys or proprietary data through it.
Use Claude Code for free. A streamlined proxy and wrapper for Claude Code that transparently routes API requests through your choice of AI provider, eliminating the need for complex Python proxy setups.
waifu CLI
A streamlined proxy and wrapper for Claude Code that transparently routes API requests through your choice of AI provider, eliminating the need for complex Python proxy setups.
By default, it uses NVIDIA NIM with moonshotai/kimi-k2-thinking for intelligent responses and natively supports Anthropic streaming APIs.
Installation
First, ensure you have the official Claude Code CLI installed:
npm install -g @anthropic-ai/claude-code
Note: Claude Code has moved to a native installer in recent versions. If you see a prompt to update, follow it — it won't break waifu.
Then, install the waifu-code CLI globally:
npm install -g waifu-code
Providers
waifu now supports multiple AI providers. The original NVIDIA NIM is the default, but you can switch with a single flag:
| Provider | Free? | Works with Claude Code? | Notes |
|---|---|---|---|
| NVIDIA NIM (default) | Free tier | ✓ Yes | Best free option, high token limits |
| OpenRouter | Free models available | ✓ Yes | Recommended — no token size limits |
| Ollama | Completely free | ⚠ Depends | Needs 32b+ for reliable tool use |
Usage
Simply run:
waifu
On first launch, it will prompt you for your API key. This key is saved automatically to ~/.waifu/config.json.
waifu will immediately start the integrated TypeScript proxy in the background on a random available port and securely launch your locally-installed claude-code CLI natively. No manual configuration or $env modifications are required!
Using a different provider
# NVIDIA NIM (original default)
waifu --provider nim --key nvapi-xxx
# OpenRouter — recommended free option
waifu --provider openrouter --key sk-or-xxx --model openrouter/free
# OpenRouter with a specific model
waifu --provider openrouter --key sk-or-xxx --model nvidia/nemotron-3-super-120b-a12b:free
# Ollama — fully local, no key needed
waifu --provider ollama --model qwen2.5:32b
Options
Usage: waifu [options]
Run the Claude Code CLI through your chosen AI provider proxy.
Options:
-v, --version Output the current version
--provider <n> AI provider: nim, openrouter, ollama (default: nim)
--key <key> API key for the chosen provider (saved automatically)
--nim-key <key> NVIDIA NIM API key (shorthand)
--openrouter-key <key> OpenRouter API key (shorthand)
--model <model> Model to use (overrides per-provider default)
--port <port> Port to run the proxy on (default: auto)
--proxy-only Start only the proxy server without launching claude
--no-waifu Disable the waifu overlay
--verbose Enable verbose logging for debugging
-h, --help Display help for command
Commands:
model Interactively select a new model for the current provider
config View or update saved configuration
providers List all supported providers and default models
Saving your config
# Save provider + model so you never have to type flags again
waifu config --provider openrouter --model openrouter/free
# View current saved config
waifu config
# Reset everything
waifu config --reset
Provider notes
OpenRouter
The most reliable free option. openrouter/free automatically picks from all currently available free models:
waifu --provider openrouter --key sk-or-xxx --model openrouter/free
Free model names change over time — if you get a 404 on a specific model name, switch to openrouter/free or check openrouter.ai/models for models marked :free.
Recommended free models that work well with Claude Code:
nvidia/nemotron-3-super-120b-a12b:free— large, reads files autonomously, good tool usedeepseek/deepseek-r1:free— strong reasoningopenrouter/free— auto-selects, always available
Ollama (local)
Ollama runs models entirely on your machine — no internet, no API key, no cost.
Setup:
- Download from ollama.com
- Pull a model:
ollama pull qwen2.5:32b - On Linux, start the server first:
ollama serve
Model guide by RAM:
| RAM | Recommended | Command |
|---|---|---|
| 16GB | qwen2.5:14b or mistral-nemo |
ollama pull qwen2.5:14b |
| 32GB | qwen2.5:32b |
ollama pull qwen2.5:32b |
| 64GB+ | qwen2.5:72b |
ollama pull qwen2.5:72b |
Known limitations with small models (below 32b):
- Models may hallucinate tool names not in Claude Code's schema (e.g.
Glob,simplify,GloballySearch) — these silently do nothing - Models tend to ask clarifying questions instead of reading files autonomously
- Tool call formatting is inconsistent
waifu handles one common Ollama issue automatically: some models output tool calls as raw JSON text instead of the structured API format. The proxy detects and converts both formats:
- Bullet+XML:
● <function=Name><parameter=key>value</parameter> - Plain JSON:
{ "name": "ToolName", "arguments": { ... } }
For reliable agentic use locally, 32b+ models are recommended.
How It Works
This tool is a drop-in replacement for the original Python proxy server. It relies on a hyper-efficient native NodeJS integration:
- Intercepts Anthropic Server-Sent Events (SSE).
- Converts Claude's messages format to OpenAI-compatible JSON (which all supported providers speak).
- Fixes Anthropic API quirks (like
?beta=truequeries and streaming header strictness) seamlessly underneath the hood. - Auto-detects and preserves
<think>tags returned by supported models without breaking the CLI UI experience. - Short-circuits trivial requests (quota checks, title generation, suggestion mode) locally without hitting any API.
- Detects raw-text tool calls from local models and converts them to proper tool use blocks.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found