Turbo-LLM

skill
Security Audit
Fail
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 9 GitHub stars
Code Fail
  • fs module — File system access in turbollm/package.json
  • Hardcoded secret — Potential hardcoded credential in turbollm/src/api/routes.ts
  • exec() — Shell command execution in turbollm/src/auth.ts
  • network request — Outbound network request in turbollm/src/bench/bench.ts
  • network request — Outbound network request in turbollm/src/chat/chat-routes.ts
  • exec() — Shell command execution in turbollm/src/chat/db.ts
  • process.env — Environment variable access in turbollm/src/cli-launch.ts
  • network request — Outbound network request in turbollm/src/cli-launch.ts
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No Electron, no Python, offline-first.

README.md

TurboLLM

TurboLLM

Run any local LLM engine, auto-tuned to your GPU — with a polished web UI and an OpenAI/Anthropic-compatible API.
Bring your own llama.cpp fork. No compiling. No Electron. No Python. Point Claude Code at your own machine in one command — fully offline.

npm version npm downloads node >= 22 license

npx turbollm

One command starts a local daemon, opens a browser UI, and serves your models over an API any
tool can talk to. TurboLLM is the performance & bleeding-edge layer for local LLMs — for
people who today hand-compile forks and hunt forums for the right flags.

How TurboLLM works: clients -> one lightweight daemon -> any engine on your GPU

Why it's different

  • 🔌 Any engine, including forks. Point it at any llama-server-compatible binary — a
    build you compiled, a community fork, or the one it auto-provisions for your GPU. No other
    local-LLM app does this. This is the whole point.
  • ⚡ Auto-tuned to your hardware — benchmarks on load, derives fast launch flags, shows a
    VRAM-fit verdict before you load.
  • 📊 Real measured tokens/sec — live while you chat, remembered per model. Never faked.
  • 🪶 Lightweight — a ~0.3 MB npm package on Node. No Electron, no Chromium, no Python.
  • 🔌 OpenAI + Anthropic APIs — run Claude Code on your own GPU in one command.
  • 🔒 Offline-first & private — no account, no backend, no telemetry.

Install

npm install -g turbollm   # or just: npx turbollm
turbollm                  # start on http://127.0.0.1:6996, open the UI
turbollm launch claude    # run Claude Code against your loaded model

Requires Node.js 22+. Works on Windows, macOS, and Linux.

How it compares

TurboLLM LM Studio Ollama Open WebUI
Run any engine / forks
Auto-tune flags to your GPU
Anthropic API → Claude Code
Use existing model folders
Lightweight (no Electron/Python)
Offline · no telemetry

📖 Full documentation

The complete catalogue + manual — every feature, the API, CLI reference, tuning, and how to
add a custom engine — lives in turbollm/README.md.

License

Source-available under the Functional Source License 1.1 (Apache-2.0 future grant) — SPDX
FSL-1.1-ALv2. Free for personal, internal-business, educational, and research use; only
shipping a competing product is restricted. Converts to Apache-2.0 two years after each
release. Full text: turbollm/LICENSE.md.

Reviews (0)

No results found