go-llm-proxy
Health Uyari
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 6 GitHub stars
Code Uyari
- network request — Outbound network request in cloudflare-counter/src/index.js
Permissions Gecti
- Permissions — No dangerous permissions requested
This tool is a single-binary proxy that connects AI coding assistants to local and cloud-based LLMs. It handles protocol translation, model routing, and adds missing capabilities like web search, PDF processing, and image description.
Security Assessment:
The proxy forwards prompts to external LLM APIs and makes outbound network requests to search providers (Tavily, Brave). A code scan flagged an outbound request in `cloudflare-counter/src/index.js`, which appears unrelated to the core Go binary and may be part of a deploy script or analytics shim—worth a closer look before deploying. The README describes solid security practices: constant-time auth, rate limiting, SSRF protection, and path allowlisting. No dangerous permissions or hardcoded secrets were found. Because it handles API keys for upstream providers and logs usage to SQLite, it does access sensitive data. Overall risk: Medium (due to the sensitive data handling and the unexplained Cloudflare network call).
Quality Assessment:
The project is actively maintained (last push was today) and released under the permissive MIT license. However, it has very low community visibility at only 6 GitHub stars, meaning it has not been widely reviewed or battle-tested. The documentation is thorough, with a clear README and a web-based config generator.
Verdict:
Use with caution—security design looks thoughtful and the project is active, but low community adoption and the flagged network request mean you should review the code yourself before relying on it for sensitive workflows.
Lightweight proxy for LLM
go-llm-proxy
A single-binary LLM proxy that connects coding assistants and AI agents to local and upstream models. Translates between API protocols, routes requests across backends, and adds tools that local backends lack — web search, image description, PDF text extraction, and OCR. Works with Claude Code, Codex, OpenCode, Qwen Code, OpenClaw, and any OpenAI/Anthropic-compatible client.
Landing page · Config generator · Releases
Common use cases
You need data security and self-host models or have upstream secure vendors (Azure, Bedrock, etc) that don't have all the expected tooling you're used to. You want to use glm-5.1 for planning and MiniMax-M2.5 for implementation and agent work, with Qwen3-VL-8B as your vision processor. You want to connect using Claude Code and Codex and have it 'just work'. You upload a PDF and it works, you upload an image and that works too. Call for a web search? The proxy intercepts natively and sends it through Tavily or Brave.
What it does
- Protocol translation — Claude Code speaks Anthropic Messages. Codex speaks OpenAI Responses. Your vLLM speaks Chat Completions. The proxy translates between them automatically.
- Model multiplexing — Aggregate local GPU servers, cloud APIs, and third-party providers behind one endpoint. Clients see one model list.
- API key management — Issue proxy keys with per-key model restrictions. Backend credentials stay on the server.
- Vision pipeline — Images sent to text-only models are described by a vision-capable model and replaced with text. Transparent to the client.
- PDF processing — Text extraction for native PDFs. OCR via dedicated model for scanned documents. Results cached across turns.
- Web search — When coding assistants request web search, the proxy executes it via Tavily or Brave Search (auto-detected from key prefix) and injects the results. No client-side MCP setup needed.
- MCP endpoint —
/mcp/sseexposes web search for OpenCode, Qwen Code, and any MCP-compatible agent. - Usage monitoring — Per-request logging to SQLite. Web dashboard, CLI reports, per-user/model breakdowns.
- Config generator — Built-in web UI creates ready-to-use configs for Claude Code, Codex, OpenCode, and Qwen Code. Also available standalone.
- Context window detection — Auto-queries backends at startup. Manual override per model.
- Hot reload — Config reloads on file save or SIGHUP. Add models or rotate keys without restarting.
- Security — Constant-time auth, IP rate limiting, SSRF protection, sanitized error responses, path allowlisting.
Quick start
./go-llm-proxy -config config.yaml
Or with Docker (limited testing):
docker compose -f docker/docker-compose.yml up -d
Minimum config
listen: ":8080"
models:
- name: my-model
backend: http://192.168.1.10:8000/v1
keys:
- key: sk-your-secret-key
name: admin
See config.yaml.example for a fully annotated starter config with all options.
Compatibility matrix
What works with each coding assistant through the proxy.
Protocol
| Claude Code | Codex CLI | OpenCode | Qwen Code | |
|---|---|---|---|---|
| Native API | Anthropic Messages | OpenAI Responses | Chat Completions | Chat Completions |
| Translation | auto-translated | auto-translated | passthrough | passthrough |
Core features
| Claude Code | Codex CLI | OpenCode | Qwen Code | |
|---|---|---|---|---|
| Text + streaming | ✓ | ✓ | ✓ | ✓ |
| Tool calling | ✓ | ✓ | ✓ | ✓ |
| Multi-turn tool loops | ✓ | ✓ | ✓ | ✓ |
| Reasoning display | ✓ | ✓ | — | — |
| Extended thinking | ✓ | ✓ | — | — |
Proxy-side processing (details)
| Claude Code | Codex CLI | OpenCode | Qwen Code | |
|---|---|---|---|---|
| Web search (Tavily / Brave) | ✓ proxy | ✓ proxy | ✓ MCP | ✓ MCP |
| Image description | ✓ vision | ✓ vision | ✓ vision | ✓ vision |
| PDF text extraction | ✓ proxy | client-side | ✓ | ✓ |
| Scanned PDF / OCR | ✓ OCR model | ✓ OCR model | ✓ | ✓ |
| Context compaction | — | ✓ | — | — |
| Usage logging & reports | ✓ | ✓ | ✓ | ✓ |
Each assistant speaks a different API protocol. The proxy detects this and translates automatically — no per-model configuration needed for the common case.
Processing pipeline
Optional. Handles content that local backends don't support natively:
processors:
vision: Qwen3-VL-8B # vision model for image descriptions
ocr: paddleOCR # fast model for PDF page text extraction (optional, falls back to vision)
web_search_key: tvly-... # Tavily or Brave Search key (auto-detected from prefix)
Without processors, the proxy just translates and routes. With it, images, PDFs, and search work on text-only backends.
Recommended processor models
| Processor | Model | Notes |
|---|---|---|
| Vision | Qwen3-VL-8B | Best quality/speed balance for image description. Handles charts, screenshots, diagrams. |
| OCR | PaddleOCR-VL-1.5 (0.9B) | Purpose-built for documents. 94.5% accuracy, 109 languages, ~2s/page. Tiny VRAM footprint. |
| Web search | Tavily or Brave Search | Tavily free: 1,000 req/month. Brave free: $5/month credit. Auto-detected from key prefix. |
Documentation
| Topic | Link |
|---|---|
| Configuration reference | docs/config-reference.md |
| Claude Code | docs/claude-code.md |
| Codex CLI | docs/codex.md |
| OpenCode | docs/opencode.md |
| Qwen Code | docs/qwen-code.md |
| Processing pipeline | docs/pipeline.md |
| Docker deployment | docs/docker.md |
| Production deployment | docs/deployment.md |
| Usage monitoring | docs/usage.md |
| Security | docs/security.md |
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi