Marionette

A desktop AI coding harness where the LLM is a component inside the kernel,
not the platform. Marionette drives any model -- frontier or cheap open-weights --
through a structured pilot loop over Puppetmaster
durable state, with CodeGraph-aware retrieval, a portable cross-session knowledge
wiki, and multi-worker delegation.

Internal-first research rig and daily-driver app. stdlib-only backend (urllib +
sqlite); Puppetmaster is the one real dependency, installed editable from a local
checkout.

Status: v0.6.x, deliberately pre-1.0. Vetted privately before any wider release.

What it is

Marionette is a three-pane desktop app (Electron + a stdlib Python backend over
SSE):

Center -- the pilot loop. A conversational driver: you talk to it, it emits
structured tool calls (read, search, edit, run, delegate), and the harness
executes them in code. No "I will now call the tool" narrator tax.
Right -- durable state. Live CodeGraph stats, the portable wiki graph, and
the artifact feed from every action and delegated job.
Left -- workspace. Projects, git branches/worktrees, sessions (auto-named
from the first message), and the Puppetmaster job list.

The thesis: there is no single best model. Hot-swap the driver by task. A cheap
local model (e.g. qwen3-coder-30b via OpenRouter, cents per session) handles
inline work; heavier reasoning and multi-file changes delegate to Puppetmaster
workers the router selects.

Core capabilities

Capability	What it does
Provider-native pilot	One driver, every OpenAI-compatible endpoint (OpenRouter or native). Frontier control models (Claude, GPT) and open-weights (GLM, DeepSeek, Kimi, Qwen, MiniMax) drive the same loop.
CodeGraph-first retrieval	Per-turn structural context is auto-injected (symbols, defs, call sites) before the model acts, so it leans on the graph instead of dumping whole files. Self-healing: the index detects edits, additions, and deletions and refreshes in the background.
Puppetmaster delegation	run_swarm (read-only analysis), run_implement (edit-capable worktree worker), run_parallel (concurrent waves). Heavy/multi-file work runs as durable, auditable jobs.
Portable LLM Wiki	Cross-session, cross-LLM durable memory. A local model structures a session digest into entity/concept/decision pages (the "backwards" orchestration) cheaply, then ingests them -- human-approved by default.
Vision on any model	Paste or drop a screenshot; a VLM sidecar (Gemini, OpenRouter fallback) transcribes it so even a non-vision driver "sees" the image.
Full-auto mode	Unattended objective pursuit bounded by an AutoBudget governor (max swarms / tokens / seconds / idle), with a non-bypassable command safety guard.
Command safety guard	In full-auto, irreversible/remote/escalating shell commands (recursive deletes, ssh/scp, curl-pipe-to-shell, force-push, sudo, disk writes, key exfil) are screened and blocked; interactive co-working is untouched. Configurable per-command timeout (default 120s; 0/off = unbounded for long sessions).

Architecture

Electron renderer (React, three panes)
        |  window.harnessIPC  (IPC bridge: getJSON / postJSON / SSE / upload)
        v
stdlib Python backend  (harness/server.py)  --  SSE event stream
        |
   ConversationalSession  (harness/conversation.py)  -- the pilot loop
        |                         |                          |
  structured tools          CodeGraph context         Puppetmaster
  (read/search/edit/run)    (per-turn, self-heal)     Orchestrator(store).run(...)
                                                              |
                                                       durable SwarmStore
                                                       (jobs, artifacts, events)

Module	Role
`harness/conversation.py`	The pilot loop: prose + structured tool calls -> execute -> feed real results back. Yields events for the GUI.
`harness/server.py`	stdlib HTTP server streaming Session events over SSE; CodeGraph status, wiki graph, settings, upload, sessions.
`harness/command_policy.py`	Pure, stdlib-only: timeout resolution + danger classification for the full-auto guard.
`harness/wiki_orchestrator.py`	Local-model structuring of a session digest into wiki pages (PM-free, testable).
`harness/state.py`	DurableState: clean read layer over Puppetmaster's SwarmStore (jobs, artifacts, live events).
`harness/config.py`	Swappable driver, reach, budget.
`webapp/`	Electron app (`electron/` main + preload + IPC bridges) and the React renderer (`src/`).

The research rig (heritage)

Marionette grew out of an eval that asked: which LLMs can drive Puppetmaster as
a native harness layer? That rig still lives in pmharness/ and stays green:

Module	Role
`pmharness/intent.py`	DriverIntent contract + strict validator + lenient text->JSON parser. Pure; no PM dependency.
`pmharness/bridge.py`	Executes a validated intent against Puppetmaster's in-process Orchestrator (local adapter; deterministic, free).
`pmharness/drivers/`	Driver protocol: StubDriver (offline oracle), OpenAICompatDriver, plus Anthropic/Gemini drivers.
`pmharness/scoring.py`	Deterministic, no LLM-as-judge: json_valid, schema_valid, action_correct, executed_ok, composite.
`pmharness/ledger.py`	Append-only SQLite record of every attempt.

Driver eval grades driving, not working: swarm intents execute on
Puppetmaster's free local adapter so ground truth is deterministic and key-free.

Install and updates

Marionette is a real installed app that updates itself the way Hermes Desktop
does. Install it once from the DMG (drag to /Applications), and from then on
new releases download in the background and apply on the next relaunch -- no
script to run, no DMG to re-download per change. The status-bar update pill
shows when a new version is ready; click it (or just quit and reopen) and the
app relaunches on the new version. Cutting a release is described in
RELEASING.md.

Contributors who want to hack on Marionette run it from a git checkout instead;
in that mode the same pill pulls + rebuilds the source in place (see below).

One-command setup on a fresh machine (macOS, Apple Silicon):

curl -fsSL https://raw.githubusercontent.com/professorpalmer/pm-harness/main/scripts/bootstrap.sh | bash

Then launch it for daily use (production renderer; the self-updater rebuilds
into this mode):

bash scripts/start.sh          # or add:  alias marionette='bash ~/pm-harness/scripts/start.sh'

Run it (contributor / dev)

Desktop app with Vite hot-reload for active editing:

cd webapp && npm install && npm run electron:dev

Or the one-command dev launcher (cleans stale processes, then launches):

bash scripts/dev.sh

The signed, auto-updating DMG is cut with scripts/release.sh -- see
RELEASING.md. That installed app is the primary channel for everyone; the git
checkout above is for people actively developing Marionette.

Research rig (offline, no keys):

python3 -m venv .venv && .venv/bin/pip install -e /path/to/Puppetmaster pytest
.venv/bin/python -m pytest -q                              # full suite, offline
.venv/bin/python scripts/run_eval.py --drivers stub-oracle # offline oracle ceiling

Configuration

The driver and keys are set in the app (Settings pane) or via env. Key vars:

Env var	Purpose
`OPENROUTER_API_KEY`	Default reach: the whole field through one endpoint.
`GEMINI_API_KEY`	Vision sidecar (image transcription); OpenRouter VLM fallback.
`HARNESS_DRIVER`	Pilot model id.
`HARNESS_COMMAND_TIMEOUT`	Per-command shell timeout in seconds; 0/off = unbounded.
`HARNESS_AUTO_COMMAND_GUARD`	Full-auto danger guard; default on, off to disable.
`HARNESS_WIKI_ORCHESTRATE`	Local wiki structuring: unset (off), 1/approve (prepare-and-approve), auto (silent ingest).
`HARNESS_AUTO_MAX_SWARMS` / `_TOKENS` / `_SECONDS` / `_MAX_IDLE`	Full-auto budget governor ceilings.

Conventions

No emojis or decorative pictographs anywhere (code, docs, commits, output).
stdlib-only backend; Puppetmaster is the single real dependency.
pmharness/intent.py and harness/command_policy.py stay PM-free and pure so
they unit-test fast and hermetically.
Scoring is deterministic -- no LLM-as-judge.
Tests before claiming done: .venv/bin/python -m pytest -q.
Never commit keys or results/*.sqlite.