model-maestro

skill
Security Audit
Warn
Health Warn
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested
Purpose
This tool acts as a unified gateway and proxy for multiple Large Language Model (LLM) providers, allowing developers to route, load-balance, and manage AI requests through a single API with an included admin dashboard.

Security Assessment
The light code scan found no hardcoded secrets, dangerous permissions, or malicious execution patterns. Because it is a proxy server by design, it naturally makes network requests to external LLM APIs. It requires a PostgreSQL database and Redis, meaning it handles sensitive infrastructure data and routes conversational AI prompts. Given the nature of an API gateway, you should ensure any deployment is properly secured behind authentication. Overall risk: Low.

Quality Assessment
The project is actively maintained, with its most recent push happening today. However, there are significant concerns regarding maturity. It currently lacks a license file, which means there are no formal terms for usage, modification, or distribution. Additionally, it has very low community visibility with only 5 GitHub stars, indicating it has not yet been widely peer-reviewed or battle-tested by the open-source community.

Verdict
Use with caution: the code appears safe and actively updated, but the lack of a license and low community adoption means it should be evaluated for internal use only until it matures.
SUMMARY

Unified LLM Gateway that proxies multiple providers (Ollama, OpenAI-compatible) behind a single API. Enables IDEs and tools to access multiple models via standard API formats. Manage LLM usage with per-user token limits, request logging, load balancing, model groups, and an admin dashboard.

README.md

Model Maestro

Config-driven Unified LLM Gateway

Route, load-balance and manage Ollama, OpenAI and other LLM providers through a single authenticated API. Model Maestro gives you user-based access control, model mapping, token usage tracking, health-checked node pooling and a modern Next.js admin dashboard — all wired to PostgreSQL + Redis.

Quick Start · Features · Architecture · API · Admin Panel


Table of Contents


Quick Start

Requires Docker & Docker Compose.

# 1. Clone
git clone <repository-url> && cd model-maestro

# 2. Configure
cp .env.example .env

# 3. Launch full stack (PostgreSQL + Redis + FastAPI + Next.js)
docker compose -f docker-compose.dev.yml up --build -d

# 4. Seed the database
docker exec maestro python -m app.seeder

# 5. Open the admin panel at http://localhost:3000
Service URL Notes
API http://localhost:8000 FastAPI gateway
Admin Dashboard http://localhost:3000 Next.js admin panel
API Docs http://localhost:8000/api/docs Basic-auth protected

For a more detailed setup guide, see docs/SETUP.md.


Features

  • JWT Authentication — Bearer-token auth on every LLM request.
  • Admin Dashboard — Next.js 16 panel for visual management of users, nodes, models, groups and audit logs.
  • Model Mapping — Translate display names (gpt-oss:120b) to real names (gpt-oss:120b-cloud) via PostgreSQL with JSON-file caching.
  • Node-Scoped Model Mappings — Bind a mapping to a specific node so the same display name can resolve to different real names on different backends.
  • Node-Scoped Routing via Model Prefix — Force a request to a specific node by prefixing the model name: node:trmix:kimi-k2.6:latest routes directly to the node with code trmix.
  • Multi-Node Load Balancing — Round-robin, weighted and priority-based strategies across Ollama and vLLM nodes.
  • vLLM Support — Native vLLM (OpenAI-compatible) node type with automatic health checks, model discovery and Authorization: Bearer header forwarding.
  • Model Groups — Group models into logical units with fallback chains. Requests dynamically resolve to the best member based on capability tags (vision, tools) and strategy.
  • Node Health Management — Automatic health checks, model discovery and availability tracking for both Ollama and vLLM nodes.
  • Per-Node Warmup Toggle — Enable or disable model warmup per node via admin UI.
  • Drag-and-Drop Node Priority — Reorder node cards in the admin panel to update fallback priority visually.
  • User-Level Access Control — Per-user model allowlists and rate limits (requests / tokens per day).
  • Token Usage Tracking — Background-batched activity logs with prompt / completion / total token breakdowns, plus request source identification (Cursor, Claude, OpenClaw, Grafana, etc.).
  • Tool Set Filtering — Restrict which tools a model is allowed to invoke via configurable tool sets.
  • Context Length Config — Per-model context length stored in mappings (used by Cursor/Antigravity for usage bars).
  • Streaming — SSE-based streaming on /api/chat, /api/generate and /v1/chat/completions.
  • OpenAI Compatible — Drop-in /v1/chat/completions, /v1/completions, /v1/embeddings and /v1/models endpoints.
  • Full Ollama API/api/generate, /api/chat, /api/embeddings, /api/tags, /api/show, /api/copy, /api/delete, /api/pull, /api/push, /api/create.
  • Grafana Assistant API — Full Grafana LLM Assistant compatibility endpoints (/grafana/assistant/*) for Grafana-native AI features.
  • DeepSeek Tool Call Parsing — Auto-detects and converts DeepSeek's raw XML tool call output (<tool_calls><invoke>, <CallMcpTool>, <tool_call name="...">) to OpenAI tool_calls format in streaming and non-streaming responses. Kimi/Moonshot <|tool_calls_section_begin|> format also supported.
  • Streaming-Aware Background Tasks — Health checks, model discovery and warmup defer when streams are active, preventing interruptions.
  • Node-Aware Model Warmup — Warmup requests target only models that exist on each node, eliminating 404 errors from stale model names.
  • Background Tasks — Redis-backed async queue for activity logging, node health checks, model discovery, model warmup and load cleanup.
  • Audit Logs — Every admin action is timestamped and queryable.
  • PostgreSQL + Alembic — Schema migrations run automatically on container startup.
  • Redis Cache — Hot-path caching for mappings, config and user usage data.

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Cursor     │     │  Antigravity │     │   Claude     │
│   IDE        │     │   IDE        │     │   Code       │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       └────────────────────┼────────────────────┘
                            │
                     ┌──────┴──────┐
                     │  Load       │
                     │  Balancer   │
                     └──────┬──────┘
                            │
       ┌────────────────────┼────────────────────┐
       │                    │                    │
┌──────┴──────┐    ┌────────┴────────┐   ┌──────┴──────┐
│  Ollama     │    │    Ollama       │   │   OpenAI    │
│  Node 1     │    │    Node 2       │   │   / Other   │
└─────────────┘    └─────────────────┘   └─────────────┘

Request Flow

Client Request
      │
      ▼
┌─────────────────┐
│  JWT Middleware │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Model Group?    │──No──▶┌──────────────┐
│ (resolve member)│       │ Model Mapper │
└────────┬────────┘       │ (display→real)│
         │Yes             └──────┬───────┘
         │                        │
         ▼                        ▼
┌─────────────────┐       ┌──────────────┐
│ Load Balancer   │──────▶│ Node Pool    │
│ (pick healthy)  │       │ (health check│
└────────┬────────┘       │  + retry)    │
         │                └──────┬───────┘
         │                       │
         ▼                       ▼
┌─────────────────┐       ┌──────────────┐
│ Ollama Proxy    │◀──────│ Ollama /     │
│ (reverse map)   │       │ Provider API │
└────────┬────────┘       └──────────────┘
         │
         ▼
    Client Response

For the full architecture documentation, see docs/ARCHITECTURE.md.


Tech Stack

Layer Technology
API Gateway Python 3.11, FastAPI, Uvicorn
Async HTTP httpx (HTTP/2)
Auth JWT (PyJWT)
Database PostgreSQL 15 + asyncpg + SQLAlchemy async
Migrations Alembic
Cache Redis 7
Frontend Next.js 16, React 19, Tailwind CSS v4, shadcn/ui
Background Tasks Redis-backed async queue
Deployment Docker, Docker Compose

Configuration

Copy .env.example to .env and set:

# Ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
JWT_SECRET_KEY=change-this-to-a-strong-secret
LOG_LEVEL=INFO

# PostgreSQL
DATABASE_URL=postgresql+asyncpg://maestro_user:maestro_password@postgres:5432/maestro

# Redis
REDIS_URL=redis://redis:6379/0

# Admin Token (for /admin/* endpoints)
ADMIN_TOKEN=change-this-for-production

# Admin Panel Login
ADMIN_USERNAME=admin
ADMIN_PASSWORD=admin

# Swagger / ReDoc Basic Auth
DOCS_USERNAME=admin
DOCS_PASSWORD=admin

Admin Panel

The Next.js dashboard (http://localhost:3000) provides a visual interface for everything.

Page What you can do
Dashboard Node health, model counts, user statistics
Users Create users, manage tokens, assign models, set limits
Nodes Add/edit Ollama and vLLM nodes, set codes, view health, trigger discovery, drag-and-drop priority
Models per Node Browse discovered models per node
Models > Mappings Display↔Real name mappings, node-scoped overrides, context length, capabilities
Models > Groups Create groups, add members, set strategy, reorder fallbacks
Models > Config Per-model tool restrictions and settings
Tool Sets Create tool groups and assign to models
Request Logs Filterable request history with source identification (Cursor, Claude, OpenClaw, Grafana, etc.)
Settings System-wide configuration
Audit Logs Filterable history of all admin actions

Default login: username admin, password from ADMIN_PASSWORD in .env.


API Reference

For the complete API reference with all request/response examples, see docs/API.md.

Authentication

Every LLM request requires:

Authorization: Bearer <jwt-token>

Admin endpoints require:

Authorization: Bearer <admin-token>

LLM Endpoints

Method Endpoint Description
POST /api/chat Chat completions (Ollama format)
POST /api/generate Text generation
POST /api/embeddings Generate embeddings
GET /api/tags List available models
POST /api/show Show model info
POST /api/copy Copy model
DELETE /api/delete Delete model
POST /api/pull Pull model
POST /api/push Push model
POST /api/create Create model from Modelfile
POST /v1/completions OpenAI-compatible completions
POST /v1/embeddings OpenAI-compatible embeddings

Example — Chat

curl -X POST http://localhost:8000/api/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Example — Streaming Chat

curl -X POST http://localhost:8000/api/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Admin Endpoints

Users

# Create user
curl -X POST http://localhost:8000/admin/users \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"username": "john"}'

# List users
curl http://localhost:8000/admin/users \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Refresh token
curl -X PUT http://localhost:8000/admin/users/john/token \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Model Assignment

# Assign specific models
curl -X POST http://localhost:8000/admin/users/john/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"models": ["gpt-oss:120b", "deepseek-v3.1:671b"]}'

# Grant access to all models
curl -X POST http://localhost:8000/admin/users/john/models/all \
  -H "Authorization: Bearer $ADMIN_TOKEN"

User Limits

# Set limits (null = unlimited)
curl -X POST http://localhost:8000/admin/users/john/limits \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"request_limit": 1000, "token_limit": 1000000}'

Model Mappings

# Create mapping with context length
curl -X POST http://localhost:8000/admin/model-mappings \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "gpt-oss:120b",
    "real_name": "gpt-oss:120b-cloud",
    "context_length": 128000,
    "capabilities": ["completion", "tools"]
  }'

# List
curl http://localhost:8000/admin/model-mappings \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Delete
curl -X DELETE http://localhost:8000/admin/model-mappings/gpt-oss:120b \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Nodes

# Add node (with optional code for prefix routing)
curl -X POST http://localhost:8000/admin/nodes \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "main",
    "base_url": "http://localhost:11434",
    "priority": 100,
    "code": "trmix",
    "node_type": "ollama"
  }'

# Toggle activation
curl -X PATCH http://localhost:8000/admin/nodes/1/toggle \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Reorder node priorities (drag-and-drop)
curl -X PATCH http://localhost:8000/admin/nodes/batch/priority \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"priorities": [{"id": 1, "priority": 200}, {"id": 2, "priority": 100}]}'

Model Groups

# Create group
curl -X POST http://localhost:8000/admin/model-groups \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "coding", "strategy": "round_robin", "description": "Code models"}'

# Add member
curl -X POST http://localhost:8000/admin/model-groups/coding/members \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model_display_name": "qwen3-coder:480b", "priority": 1}'

Grafana Assistant

# List chats
curl http://localhost:8000/grafana/assistant/chats \
  -H "Authorization: Bearer $TOKEN"

# Create chat
curl -X POST http://localhost:8000/grafana/assistant/chats \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

# Stream chat
curl -X POST http://localhost:8000/grafana/assistant/chat/stream \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

# Get LLM config
curl http://localhost:8000/grafana/assistant/config \
  -H "Authorization: Bearer $TOKEN"

# Update LLM config
curl -X POST http://localhost:8000/grafana/assistant/config \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-oss:120b", "temperature": 0.7}'

# Check infrastructure discovery status
curl http://localhost:8000/grafana/assistant/discovery \
  -H "Authorization: Bearer $TOKEN"

OpenAI Compatible

Method Endpoint Description
POST /v1/chat/completions Chat completions (OpenAI format)
POST /v1/completions Text completions (OpenAI format)
POST /v1/embeddings Embeddings (OpenAI format)
GET /v1/models Model list (OpenAI format)

Example — OpenAI Compatible

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Model Mapping & Routing

Display Name → Real Name

Client sends:       gpt-oss:120b
Proxy looks up:     gpt-oss:120b → gpt-oss:120b-cloud
Ollama receives:    gpt-oss:120b-cloud

Real Name → Display Name

Ollama returns:     gpt-oss:120b-cloud
Proxy translates:   gpt-oss:120b-cloud → gpt-oss:120b
Client sees:        gpt-oss:120b

Node Prefix Routing

Force a request to a specific node by prefixing the model name with its code:

Client sends:       node:trmix:kimi-k2.6:latest
Gateway parses:     code = "trmix", model = "kimi-k2.6:latest"
Node lookup:        trmix → node #3
Model mapping:      kimi-k2.6:latest → kimi-k2.6:latest-cloud
Node #3 receives:   kimi-k2.6:latest-cloud
  • Syntax: node:{code}:{model_name}
  • The code is the unique short identifier set on each node in the admin panel.
  • If the code does not exist, the gateway returns 404 Node with code 'x' not found.
  • When a prefix is present, the load balancer is skipped and the request goes directly to the matched node.
  • Prefix routing works on every endpoint that accepts a model parameter: /api/chat, /api/generate, /v1/chat/completions, /v1/embeddings, etc.

Model Groups

If the requested model is a group, the gateway resolves it dynamically:

  1. Detect if the request needs vision (image content in messages).
  2. Filter members by capability tags (vision, tools).
  3. Pick a member using the group's strategy:
    • round_robin — cycle through members
    • weighted — weighted random selection
    • priority — always pick lowest priority number
  4. If the selected model fails, retry with the next member in priority order.

Node-Scoped Mappings

A model mapping can be bound to a specific node so the same display name resolves to a different real name on different backends. This is useful when nodes host different variants of the same model (e.g. a CPU-quantized version on one node and a full-GPU version on another).


Troubleshooting

Restart the full stack

docker compose -f docker-compose.dev.yml down
docker compose -f docker-compose.dev.yml up --build -d

Run migrations manually

docker exec maestro alembic upgrade head

Re-run seeds

docker exec maestro python -m app.seeder --reset
docker exec maestro python -m app.seeder

Clear cache

docker exec maestro python scripts/clear_cache.py

Check PostgreSQL health

docker exec maestro-postgres pg_isready -U maestro_user -d maestro

Check Redis

docker exec maestro-redis redis-cli ping

View logs

# All services
docker compose -f docker-compose.dev.yml logs -f

# API only
docker compose -f docker-compose.dev.yml logs -f maestro

# Frontend only
docker compose -f docker-compose.dev.yml logs -f frontend

Development

Project Structure

model-maestro/
├── app/
│   ├── main.py              # FastAPI app, routers, docs auth
│   ├── proxy.py             # Proxy logic, model routing, failover, tool call parsing
│   ├── config.py            # Settings, ModelMappingManager, ModelGroupManager
│   ├── auth.py              # JWT authentication
│   ├── models.py            # Pydantic request/response models
│   ├── models_db.py         # SQLAlchemy ORM models
│   ├── database.py          # Async DB engine & session maker
│   ├── redis.py             # Redis client & queue
│   ├── load_balancer.py     # Node selection algorithms
│   ├── node_manager.py      # Health checks, discovery, node CRUD
│   ├── user_manager.py      # User CRUD
│   ├── background_tasks.py  # Activity log processor, health checks, model warmup
│   ├── openclaw.py          # OpenClaw integration
│   ├── admin*.py            # Admin API routers
│   ├── repositories/        # Data access layer
│   ├── services/            # Business logic layer
│   └── seeds/               # DB seed migrations
├── frontend/
│   ├── src/app/             # Next.js App Router pages
│   ├── src/components/      # React components (sidebar, shell, etc.)
│   └── public/              # Static assets (logo, favicon)
├── docs/                    # Documentation (architecture, API, setup)
├── alembic/                 # Alembic migrations
├── tests/                   # pytest suite
├── docker-compose.dev.yml   # Dev stack (PG + Redis + API + Frontend)
├── docker-compose.yml       # Production stack (API + Frontend only)
└── Dockerfile               # FastAPI container

Running Tests

python -m pytest tests/ -v

Lint & Format

# Backend
python -m black app/
python -m ruff check app/

# Frontend
cd frontend && npm run lint

Documentation

  • docs/ARCHITECTURE.md — System architecture, request flow, database schema
  • docs/API.md — Complete API reference with all endpoints, requests and responses
  • docs/SETUP.md — Detailed setup guide, environment variables, production deployment
  • QUICKSTART.md — Get running in under 5 minutes

License

MIT

Reviews (0)

No results found