Hecate

Hecate is an open-source AI gateway and agent-task runtime for teams that want one control plane for model access, cost governance, routing, caching, observability, and controlled agent execution.

It sits between AI clients and model providers. Existing OpenAI-compatible and Anthropic-compatible clients can point at Hecate, while operators get a place to manage providers, costs, traces, cache behavior, and queued agent work. Multi-tenant management is opt-in — the default deployment is a single-user gateway with one admin bearer.

Why Hecate
Modes
Quick Start
Architecture
Operator UI
What Works Today
Configuration
Documentation
Contributing
License

Why Hecate

AI workloads are moving from simple API calls to long-running agents, tool use, local/cloud routing, and budget-sensitive automation. Hecate is built for that messier runtime layer.

One gateway for many clients — OpenAI Chat Completions and Anthropic Messages shapes.
Local and cloud providers together — OpenAI, Anthropic, Ollama, LM Studio, LocalAI, llama.cpp-compatible servers, and other shipped presets.
Operator-controlled spend — balances, pricebook management, rate limits, audit history, and (opt-in) per-tenant API keys with model/provider scoping.
Runtime visibility — request ledger, route reports, failover details, cost, cache path, trace IDs, and OpenTelemetry export.
Agent-task runtime — queued tasks, approvals, controlled shell/file/git execution, resumable runs, and MCP integration.
Single binary deploy — Go gateway with the React operator UI embedded via go:embed. One process, one port, one volume; no separate frontend service to run.

Modes

Hecate runs in one of two modes. The flag flips at startup; you can switch between runs without losing state.

	Single-user (default)	Multi-tenant (opt-in)
Flag	`GATEWAY_MULTI_TENANT=false`	`GATEWAY_MULTI_TENANT=true`
Auth	One admin bearer; loopback handshake auto-fills it for same-host browsers.	Admin bearer plus per-tenant API keys, each scoped to allowed providers and models.
Operator UI	Chats, Providers, Tasks, Observability, Costs, Settings (Pricing / Policy / Retention).	Same plus the Tenants and Keys tabs in Settings.
Observability	Admin sees everything; tenants see nothing because there are no tenants.	Tenants see their own traces / requests / runtime stats via `/v1/` mirrors of the `/admin/` endpoints.
Use when	One operator on one host; local dev; a personal gateway behind a single key.	Multiple consumers, per-key audit, scoped credentials.

The published Docker image ships single-user. Full breakdown in docs/tenants.md.

Quick Start

Single-user path; for multi-tenant see docs/tenants.md.

1. Run the image

docker run --rm -p 8765:8765 -v hecate-data:/data \
  ghcr.io/chicoxyzzy/hecate:0.1.0-alpha.7

2. Open the UI

Open http://127.0.0.1:8765. On a localhost browser the console picks up the generated admin bearer through a same-origin loopback handshake — no token paste needed.

3. Add a provider

The Providers tab starts empty. Click Add provider, pick a preset (or Custom for any OpenAI-compatible endpoint), and paste an API key (cloud) or endpoint URL (local).

Empty Providers tab on first boot — Add provider CTA

Add provider modal on the Cloud tab — preset catalog

Providers table populated with three providers — Health, Endpoint, Credentials, Models

Cloud presets need an API key; local presets just need the runtime listening on its default port. Full catalog, custom-endpoint walk-through, and credential rotation in docs/providers.md.

4. Talk to it

Chats workspace talking to a local Ollama llama3.1:8b model with sessions sidebar and inline runtime metadata

Remote browsers, reverse proxies, and cross-origin setups

The loopback handshake only fires for same-host browsers. Anywhere else (Tailscale, port-forward over SSH, reverse proxy with a different hostname) the UI shows a token paste prompt:

Token paste prompt — remote / cross-origin browsers

The bootstrap token is printed once to the container logs:

============================================================
  Hecate first-run setup — admin bearer token generated.

    7f2a91b... (truncated)

  Saved to /data/hecate.bootstrap.json (mode 0600).
============================================================

It also lives in hecate.bootstrap.json on the hecate-data volume — recovery instructions in docs/deployment.md.

Other install paths (clone, Postgres profile, source build, env-as-code)

Cloning the repo lets you pick up optional compose profiles or rebuild from source:

docker compose up                    # uses the ghcr.io image; first run pulls
docker compose --profile postgres up # adds Postgres for durable state across all subsystems

Local development:

make dev

Pinned image tags, single-file binaries (linux/darwin × amd64/arm64), and checksums are in docs/deployment.md. Local development knobs in docs/development.md.

Provider keys can also be pre-seeded via .env for fleet automation — PROVIDER_<NAME>_API_KEY, _BASE_URL, _DEFAULT_MODEL, plus the _PRECONFIGURED=1 gate. See docs/providers.md. The /admin/control-plane/providers endpoints mirror every UI action for programmatic management.

Architecture

One Go process, one port. Inside it: a chat/messages gateway that mediates client traffic to upstream providers, and a task runtime that queues agent work, drives approvals, and shells out through a sandbox boundary. The React operator UI is embedded into the same binary and served from the same port.

flowchart LR
    Clients["Clients<br/>Codex, Claude Code, SDKs"]
    Browser["Browser<br/>(operator UI)"]

    subgraph Hecate["Hecate (single binary, :8765)"]
        direction TB
        Gateway["Gateway<br/>/v1/chat/completions<br/>/v1/messages<br/>/v1/models"]
        Runtime["Task runtime<br/>/v1/tasks/*<br/>queue + workers + sandbox"]
        UI["Embedded UI<br/>(go:embed ui/dist)"]
    end

    Clients --> Gateway
    Clients --> Runtime
    Browser --> UI
    UI --> Gateway
    UI --> Runtime

    Gateway --> Providers["Cloud + local providers"]
    Gateway --> Cache["Exact + semantic cache"]
    Runtime --> Sandbox["sandboxd<br/>(out-of-process exec)"]
    Runtime --> MCP["External MCP servers"]
    Gateway --> OTel["OpenTelemetry"]
    Runtime --> OTel

For deeper internals, read docs/architecture.md, docs/runtime-api.md, and docs/events.md.

Operator UI

The embedded UI is a runtime console for operators.

Chats — send requests through Hecate, choose provider/model, inspect per-turn route/cost/cache metadata.
Providers — manage provider credentials, defaults, model discovery, base URLs, and health.
Tasks — create and manage agent runs, approvals, retries, resumes, and streamed output.
Observability — inspect requests, route candidates, skip reasons, failover, costs, cache decisions, and trace events.
Costs — balance, top-up / reset, usage table.
Settings — pricebook, policy rules, retention, and (when GATEWAY_MULTI_TENANT=true) tenants + API keys.

Various UI screenshots

Observability view — request ledger and route-report drilldown

Tasks workspace — task list with run state and approval queue

Costs workspace — balance card and per-key usage table

Settings → Pricebook — model catalog with priced / unpriced / deprecated filters

What Works Today

Hecate is public-alpha software. The core gateway path is usable; the agent runtime and sandbox are intentionally still evolving.

Area	State	Notes
OpenAI-compatible gateway	Usable	Chat Completions, streaming, vision, model discovery
Anthropic-compatible gateway	Usable	Messages API shape, streaming translation, Claude Code support
Provider catalog	Usable	Built-in presets, encrypted credentials, health, routing readiness
Local providers	Usable	Ollama, LM Studio, LocalAI, llama.cpp-compatible servers
Auth	Usable	Admin bearer with same-origin loopback handshake; `GATEWAY_AUTH_DISABLED` for upstream-terminated auth
Tenants and API keys	Opt-in	`GATEWAY_MULTI_TENANT=true` exposes tenant + key management with provider/model scoping
Budgets and rate limits	Usable	Balances, warning thresholds, pricebook, `429` rate-limit headers
Caching	Usable	Exact cache; semantic cache is available but still early
OpenTelemetry	Usable	OTLP traces, metrics, logs, response headers, local trace view
Storage tiers	Usable	Memory, SQLite, Postgres, selected per subsystem
Operator UI	Usable	Main workflows are present; debugging ergonomics are still improving
Agent task runtime	Alpha	Queues, approvals, resumable runs, `agent_loop`, MCP integration
Execution isolation	Alpha	`sandboxd` boundary exists; stronger OS-level isolation is future work

Read docs/known-limitations.md before treating Hecate as production-stable.

Configuration

The README intentionally stays light on configuration. The source of truth is:

.env.example — practical first-run environment knobs.
docs/deployment.md — Docker, storage tiers, rate limits, image pinning, reset/recovery.
docs/providers.md — provider presets, local runtimes, credentials, health.
docs/telemetry.md — OTLP traces, metrics, logs, collector recipes.
docs/agent-runtime.md — task runtime, approvals, tools, workspace modes.
docs/mcp.md — MCP server and MCP tool integration.

Documentation

Browse the full index at docs/README.md. Highlights:

Run it — Deployment, Providers, Tenants and API keys, Known limitations
Use it — Runtime API, Agent runtime, Events, MCP integration
Observe it — Telemetry
Build it — Architecture, Development, Release, Chat sessions internals

Contributing

See CONTRIBUTING.md. If you work with an AI assistant, start with AGENTS.md; the vendor-neutral agent instruction layer lives in ai/.

License

MIT. See LICENSE.

Third-party data and software notices live in NOTICE.md, including LiteLLM pricing-data attribution.

hecate