OmniHarness

Name: omniHarness
Author: archimedes-run

A self-extending agent platform — the agent builds, registers, and verifies its own capabilities at runtime, instead of you wiring them up and restarting the stack.

OmniHarness is a long-horizon agent harness for tasks that take minutes to hours: it researches, codes, and creates inside real sandboxes, with memory, subagents, skills, and a multi-channel gateway. What makes it different is the bet underneath it — that the capability layer itself should be authored by the agent and hot-loaded, and that the agent should close its own build-run-fix loops instead of handing you errors to relay back.

Built on LangGraph and LangChain. Works with any OpenAI-compatible LLM.

Why OmniHarness?

Relationship to DeerFlow: OmniHarness began as a fork of DeerFlow 2.0 (ByteDance, MIT) and owes it a real debt — see Acknowledgements. It has since diverged into an independent project with its own thesis and roadmap. OmniHarness is not affiliated with or endorsed by ByteDance.

DeerFlow is an excellent, fast-moving SuperAgent harness. If you want the best-supported research/coding agent with the broadest community, use DeerFlow — it's backed by a large team.

OmniHarness exists for a different goal: removing the human from the inner loop entirely.

The agent closes its own loops. It doesn't just write code and stop — it runs the result, sees what breaks, fixes it, and confirms it's clean before handing anything back. Live preview is the first concrete instance of this.
The agent extends the platform itself. New connectors, MCP servers, skills, and workflows are things the agent builds and registers at runtime — not things you hand-wire and docker restart to install.

If that direction is what you're after, OmniHarness is the bet. If you need the broadest community and production mileage today, DeerFlow is the safer pick — and we'll say so honestly.

Available now

Live preview for web app development

The agent builds a web app and the preview runs inside the sandbox with the gateway proxying it straight to your browser — no separate setup. For static pages and dev-server projects (anything with an index.html or a package.json dev script), the preview just comes up.

The self-verifying loop: preview auto-starts when the agent produces a web artifact, and a verification gate stops the agent from declaring victory while the build is broken — it reads the dev-server/build error, fixes its own code, and re-checks until the preview runs clean. You stop being the courier who copy-pastes errors back to the agent.

The harness foundation

Sandboxed execution for code, shell, and file work — isolated Docker containers per thread
Subagents for decomposing long-horizon tasks and running them in parallel
Persistent memory across sessions
Skills — structured, domain-specific capability bundles, hot-loaded on demand
Multi-channel gateway — drive the agent from Slack, Telegram, Feishu, DingTalk, or the web UI
MCP client + OAuth for connecting external tools

Roadmap

These are in development, not done — listed so you know where OmniHarness is going.

Agent-authored connectors, bring-your-own-key. Describe an integration; the agent writes the connector code and wires it up. Your API keys are scoped per-connector and never readable back by the model.
Agent-built MCP servers. The agent scaffolds, builds, and registers MCP servers at runtime.
Hot skill installation. Add or update a skill live — no container restart.
Workflows + triggers. Multi-step automations fired on schedules, webhooks, or channel events.

On safety: because OmniHarness lets the agent write and run code with your credentials, the guardrails are the product. Agent-authored code runs sandboxed off the control plane, secrets are scoped per-capability and never exposed back to the model, and first execution of any agent-built capability passes a human approval gate.

Quick Start
Core Features
Recommended Models
Embedded Python Client
Security Notice
Contributing
License
Acknowledgments

Quick Start

Docker is the only supported way to run OmniHarness. It handles all service wiring, sandbox isolation, and path mappings automatically.

Prerequisites

Requirement	Minimum	Notes
Docker Desktop	4.x+	macOS, Windows, Linux. OrbStack also works on macOS.
Docker Compose	v2	Bundled with Docker Desktop
Disk space	12 GB free	~9 GB for the sandbox image + build artefacts
RAM	8 GB	16 GB recommended for parallel sub-agents
CPU	4 vCPU	8 vCPU recommended

Linux users: add your user to the docker group (sudo usermod -aG docker $USER) so you can run Docker commands without sudo.

Configuration

Clone the repository

git clone https://github.com/archimedes-run/omniHarness.git
cd omni-harness

Run the setup wizard

make setup

The wizard guides you through choosing an LLM provider, optional web search, and sandbox settings. It generates config.yaml and writes your API keys to .env.

Run make doctor at any time to verify your configuration.

Manual model configuration examples

models:
  - name: gpt-4o
    display_name: GPT-4o
    use: langchain_openai:ChatOpenAI
    model: gpt-4o
    api_key: $OPENAI_API_KEY

  - name: claude-sonnet-4-6
    display_name: Claude Sonnet 4.6
    use: langchain_anthropic:ChatAnthropic
    model: claude-sonnet-4-6
    api_key: $ANTHROPIC_API_KEY
    supports_thinking: true

  - name: gemini-2-5-pro
    display_name: Gemini 2.5 Pro
    use: langchain_google_genai:ChatGoogleGenerativeAI
    model: gemini-2.5-pro
    gemini_api_key: $GEMINI_API_KEY

  - name: openrouter-model
    display_name: Any model via OpenRouter
    use: langchain_openai:ChatOpenAI
    model: google/gemini-2.5-flash-preview
    api_key: $OPENROUTER_API_KEY
    base_url: https://openrouter.ai/api/v1

  - name: qwen3-32b-vllm
    display_name: Qwen3 32B (self-hosted vLLM)
    use: omniharness.models.vllm_provider:VllmChatModel
    model: Qwen/Qwen3-32B
    api_key: $VLLM_API_KEY
    base_url: http://localhost:8000/v1
    supports_thinking: true
    when_thinking_enabled:
      extra_body:
        chat_template_kwargs:
          enable_thinking: true

CLI-backed providers:

models:
  - name: codex-cli
    display_name: GPT-5.4 (Codex CLI)
    use: omniharness.models.openai_codex_provider:CodexChatModel
    model: gpt-5.4
    supports_thinking: true

  - name: claude-code-oauth
    display_name: Claude Sonnet 4.6 (Claude Code OAuth)
    use: omniharness.models.claude_provider:ClaudeChatModel
    model: claude-sonnet-4-6
    supports_thinking: true

Codex CLI reads auth from ~/.codex/auth.json
Claude Code OAuth reads from CLAUDE_CODE_OAUTH_TOKEN, ANTHROPIC_AUTH_TOKEN, or ~/.claude/.credentials.json

Set the repository root in .env

The Docker sandbox uses bind mounts to give each agent thread an isolated filesystem on the host. It needs to know the absolute path to this repository:
```
# .env
OMNI_HARNESS_ROOT=/absolute/path/to/omni-harness
```
On macOS/Linux you can generate this automatically:
```
echo "OMNI_HARNESS_ROOT=$(pwd)" >> .env
```
This is the most common reason sandbox tools (bash, write_file, execute_python) silently fail. If agent output looks like it's running but files never appear, check this value first.

Running with Docker

1. Pull the sandbox image

make docker-init

This pulls the default AIO Sandbox container image (ghcr.io/archimedes-run/omni-harness-sandbox:latest, ~9 GB). The AIO Sandbox is a pre-built, isolated execution environment with Python 3, Node.js, bash, and common system tools — one container is spawned per agent thread and automatically cleaned up after 10 minutes of idle time.

Expected: ghcr.io/archimedes-run/omni-harness-sandbox:latest

Override the image for private mirrors or local builds by setting SANDBOX_IMAGE before running make docker-init or make docker-start.

You only need to run this once, or again after an image update. The download can take several minutes depending on your connection.

Verifying the image is ready:

docker images | grep omni-harness-sandbox
# ghcr.io/archimedes-run/omni-harness-sandbox   latest   <id>   <date>   ~9GB

Manual override (if you use a private mirror or custom sandbox image):

SANDBOX_IMAGE=registry.example.com/omni-harness-sandbox:latest make docker-init
SANDBOX_IMAGE=registry.example.com/omni-harness-sandbox:latest make docker-start

Then confirm config.yaml uses the same image, or omit image to use the default:

sandbox:
  use: omniharness.community.aio_sandbox:AioSandboxProvider
  image: ghcr.io/archimedes-run/omni-harness-sandbox:latest

2. Start all services

make docker-start

Access: http://localhost:2026

On first run, make docker-start creates missing .env and frontend/.env files from their examples. If config.yaml is also missing, it creates one from config.example.yaml and stops so you can add API keys and model settings before starting containers.

This starts three containers:

Container	Role	Internal port
`omni-harness-nginx`	Reverse proxy — single entry point	2026
`omni-harness-gateway`	Backend API + LangGraph agent runtime	8001
`omni-harness-frontend`	Next.js web UI (hot-reload in dev)	3000

Source code is mounted directly into the containers (backend/ and frontend/src/), so code changes are reflected without a rebuild.

Stop services:

make docker-stop

Restart after a config change:

make docker-stop && make docker-start

config.yaml changes (models, skills, tools) are hot-reloaded automatically by the gateway — no restart needed for those. A restart is only required when adding new environment variables to .env.

3. How it works under the hood

Understanding this architecture helps diagnose issues quickly.

Your browser
    │
    ▼
nginx (port 2026)
    ├── /api/langgraph/* → gateway:8001  (agent runtime / LangGraph)
    ├── /api/*          → gateway:8001  (REST API)
    └── /*              → frontend:3000 (Next.js UI)

gateway container
    ├── Reads config.yaml (hot-reload on mtime change)
    ├── Mounts /var/run/docker.sock  ← can run Docker commands on the host
    ├── On each agent thread:
    │     docker run ghcr.io/archimedes-run/omni-harness-sandbox:latest  (spawns on HOST daemon)
    │         bind-mount: $OMNI_HARNESS_ROOT/backend/.omni-harness/
    │                     users/{user}/threads/{thread}/user-data/
    │     Sandbox is reachable at host.docker.internal:{port}
    └── Sandbox containers auto-removed after 10 min idle (--rm + idle GC)

Key environment variables set automatically by docker-compose:

Variable	Value	Purpose
`OMNI_HARNESS_ROOT`	from `.env`	Root of the repo on the host — required for bind mounts
`OMNI_HARNESS_HOST_BASE_DIR`	`$OMNI_HARNESS_ROOT/backend/.omni-harness`	Host path prefix for thread directories
`OMNI_HARNESS_SANDBOX_HOST`	`host.docker.internal`	Hostname to reach sandbox containers from inside the gateway
`OMNI_HARNESS_HOST_SKILLS_PATH`	`$OMNI_HARNESS_ROOT/skills`	Host path for skills directory mount
`SANDBOX_IMAGE`	`ghcr.io/archimedes-run/omni-harness-sandbox:latest`	Sandbox container image; override for mirrors or custom builds

4. Management commands

make docker-start       # Start all services (dev mode, hot-reload)
make docker-stop        # Stop all services
make docker-restart     # Stop then start
make docker-logs        # Tail logs from all containers
make docker-status      # Show container status
make docker-init        # Pull the configured/default sandbox image (run once)

View logs for a specific service:

docker logs -f omni-harness-gateway    # backend + agent logs
docker logs -f omni-harness-frontend   # Next.js build / page logs
docker logs -f omni-harness-nginx      # proxy access logs

List active sandbox containers (spawned per agent thread):

docker ps --filter "name=omni-harness-sandbox"

5. Troubleshooting sandbox issues

The sandbox is what gives agents the ability to run bash, write files, and execute Python. If those tools silently fail or return errors, work through this checklist:

Sandbox containers never start

Check that the image exists and the tag matches config.yaml:

docker images | grep omni-harness-sandbox
# Expected: ghcr.io/archimedes-run/omni-harness-sandbox   latest   <id>

If missing, re-run make docker-init.

Manual override (if you use a private mirror or custom sandbox image):

SANDBOX_IMAGE=your-registry/your-image:tag make docker-init
SANDBOX_IMAGE=your-registry/your-image:tag make docker-start

bind source path does not exist errors in gateway logs

OMNI_HARNESS_ROOT is unset or wrong. The gateway uses it to build host-side bind-mount paths. Fix:

# In your .env file at the project root:
OMNI_HARNESS_ROOT=/absolute/path/to/omni-harness

Then restart: make docker-stop && make docker-start.

Sandbox starts but agent can't reach it

The gateway connects to sandbox containers via host.docker.internal. Verify it resolves:

docker exec omni-harness-gateway ping -c 1 host.docker.internal

On Linux, add extra_hosts: ["host.docker.internal:host-gateway"] to the gateway service in docker/docker-compose-dev.yaml if it doesn't resolve.

Port conflicts on startup

Port 2026 is the default entry point. If it's in use:

lsof -i :2026
# Change the port in docker/docker-compose-dev.yaml:
#   ports: ["YOUR_PORT:2026"]

Checking the gateway picked up your config.yaml changes

docker exec omni-harness-gateway /app/backend/.venv/bin/python3 -c "
from omniharness.config.app_config import get_app_config
cfg = get_app_config()
print('sandbox.image:', cfg.sandbox.image)
print('models:', [m.name for m in cfg.models])
"

Production deployment

For a stable, persistent server (pre-built images, no source mounts):

make up      # Build images and start all services
make down    # Stop and remove containers

Target	Minimum	Recommended
Docker dev (`make docker-start`)	4 vCPU, 8 GB RAM	8 vCPU, 16 GB RAM
Production (`make up`)	8 vCPU, 16 GB RAM	16 vCPU, 32 GB RAM

Advanced

Sandbox Mode

OmniHarness supports three sandbox execution modes:

Docker (recommended) — each thread gets an isolated container with a full filesystem, powered by AIO Sandbox
Local — file tools mapped to per-thread host directories; host bash disabled by default
Kubernetes — containers provisioned as Pods via the optional provisioner service

config.yaml sandbox section:

sandbox:
  use: omniharness.community.aio_sandbox:AioSandboxProvider
  image: ghcr.io/archimedes-run/omni-harness-sandbox:latest
  bash_output_max_chars: 20000
  read_file_output_max_chars: 50000
  ls_output_max_chars: 20000

See Configuration Guide for all options including idle timeout, replica count, and Kubernetes provisioner setup.

MCP Servers

Connect any MCP server to extend the agent's toolset. HTTP/SSE and stdio transports are supported. OAuth token flows (client_credentials, refresh_token) are supported for HTTP/SSE servers.

See MCP Server Guide for setup instructions.

IM Channels

OmniHarness can receive tasks from messaging apps. Channels auto-start when configured and require no public IP.

Channel	Transport
Telegram	Bot API long-polling
Slack	Socket Mode

Configuration in config.yaml:

channels:
  langgraph_url: http://localhost:8001/api
  gateway_url: http://localhost:8001

  telegram:
    enabled: true
    bot_token: $TELEGRAM_BOT_TOKEN
    allowed_users: []   # empty = allow all

  slack:
    enabled: true
    bot_token: $SLACK_BOT_TOKEN    # xoxb-...
    app_token: $SLACK_APP_TOKEN    # xapp-... (Socket Mode)
    allowed_users: []

Keys in .env:

TELEGRAM_BOT_TOKEN=...
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...

Telegram Setup

Chat with @BotFather, send /newbot, copy the token.
Set TELEGRAM_BOT_TOKEN in .env and enable the channel in config.yaml.

Slack Setup

Create a Slack App at api.slack.com/apps.
Under OAuth & Permissions, add scopes: app_mentions:read, chat:write, im:history, im:read, im:write, files:write.
Enable Socket Mode, generate an App-Level Token (xapp-…) with connections:write.
Subscribe to bot events: app_mention, message.im.
Set both tokens in .env and enable the channel in config.yaml.

When running in Docker Compose, use container service names: http://gateway:8001/api and http://gateway:8001, or set OMNI_HARNESS_CHANNELS_LANGGRAPH_URL and OMNI_HARNESS_CHANNELS_GATEWAY_URL.

In-chat commands:

Command	Description
`/new`	Start a new conversation
`/status`	Show current thread info
`/models`	List available models
`/memory`	View memory
`/help`	Show help

Observability

LangSmith:

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=lsv2_pt_...
LANGSMITH_PROJECT=my-project

Langfuse:

LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_BASE_URL=https://cloud.langfuse.com

Both can be enabled simultaneously.

Core Features

Skills & Tools

Skills are structured capability modules — a SKILL.md file that defines a workflow, best practices, and references. OmniHarness ships with built-in skills for research, report generation, slide creation, web pages, and chart visualisation. You can add your own or replace the built-in ones.

Skills are loaded progressively — only when a task needs them. This keeps the context window lean.

Tools ship as a core set — web search, web fetch, file operations, bash execution — and extend via MCP servers or custom Python functions.

/mnt/skills/public/
├── research/SKILL.md
├── report-generation/SKILL.md
├── slide-creation/SKILL.md
├── web-page/SKILL.md
└── chart-visualization/SKILL.md

/mnt/skills/custom/
└── your-skill/SKILL.md

Sub-Agents

The lead agent can spawn specialised sub-agents on the fly. Each runs in its own isolated context with scoped tools and a defined termination condition. Sub-agents execute in parallel when possible and report structured results back to the lead agent, which synthesises everything into a coherent output.

Up to 3 sub-agents run concurrently by default (configurable).

Sandbox & File System

Each task gets a per-thread execution environment powered by AIO Sandbox — an isolated Docker container with a full filesystem the agent can read, write, and execute inside.

/mnt/user-data/
├── uploads/     ← files you upload
├── workspace/   ← agent working directory
└── outputs/     ← final deliverables

Containers are spawned on demand, bind-mounted to a per-thread directory on the host, and removed automatically after 10 minutes of idle time. With LocalSandboxProvider, file tools map to per-thread host directories instead — host bash is disabled by default in that mode.

Long-Term Memory

Across sessions, OmniHarness builds a persistent memory of your preferences, working context, and accumulated facts. Memory is stored locally per user and injected into the system prompt on each turn. The agent extracts and stores facts asynchronously after each conversation.

Context Engineering

Isolated sub-agent context — each sub-agent sees only what it needs
Automatic summarisation — completed sub-task context is compressed to keep the window lean
Tool-call recovery — dangling tool calls from interrupted turns are resolved before the next model invocation so provider-strict models do not fail with malformed history errors

Recommended Models

OmniHarness works with any OpenAI-compatible LLM. It performs best with models that support:

Long context windows (100k+ tokens)
Reasoning / extended thinking
Multimodal input (vision)
Reliable tool use

Tested providers: OpenAI, Anthropic, Google Gemini, DeepSeek, vLLM (self-hosted), OpenRouter (multi-model gateway).

Embedded Python Client

Use OmniHarness as a library without running HTTP services:

from omniharness.client import OmniHarnessClient

client = OmniHarnessClient()

# Synchronous chat
response = client.chat("Summarise this paper", thread_id="my-thread")

# Streaming (LangGraph SSE protocol)
for event in client.stream("Write a report on quantum computing"):
    if event.type == "messages-tuple" and event.data.get("type") == "ai":
        print(event.data["content"], end="", flush=True)

# Management
models = client.list_models()
skills = client.list_skills()
client.update_skill("web-search", enabled=True)
client.upload_files("thread-1", ["./report.pdf"])

All dict-returning methods are validated against Gateway Pydantic response schemas in CI to keep the embedded client in sync with the HTTP API.

See backend/packages/harness/omniharness/client.py for full documentation.

Security Notice

OmniHarness has high-privilege capabilities — system command execution, file read/write — and is designed for deployment in a local trusted environment (loopback only by default).

If you expose it to a network, apply appropriate controls:

Authentication — OmniHarness has a built-in auth system. Ensure it is enabled and AUTH_JWT_SECRET is set to a strong secret in production.
IP allowlist — restrict inbound access to known addresses via firewall rules or nginx ACLs.
Network isolation — place the service and trusted clients on a dedicated VLAN where possible.
TLS — terminate HTTPS at the reverse proxy for any non-localhost deployment.

Contributing

Contributions are welcome. See CONTRIBUTING.md for development setup and workflow.

For local (non-Docker) development:

make check    # Verify prerequisites: Node.js 22+, pnpm, uv, nginx
make install  # Install backend + frontend dependencies
make dev      # Start all services with hot-reload (http://localhost:2026)

License

MIT — see LICENSE.

Acknowledgments

OmniHarness stands on the shoulders of DeerFlow by ByteDance, from which it was forked. The harness architecture, sandbox model, and gateway design all started there. DeerFlow is MIT-licensed; so is OmniHarness.

Original copyright notices retained per MIT terms:

OmniHarness is an independent project and is not affiliated with, sponsored by, or endorsed by ByteDance.

DeerFlow itself builds on the open-source community, and those debts carry through:

LangGraph — multi-agent orchestration runtime
LangChain — LLM integration framework
AIO Sandbox — isolated per-thread execution containers
Shadcn UI — frontend component system
Next.js — frontend framework

omniHarness