pm-harness
Health Uyari
- No license — Repository has no license file
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Marionette: a desktop AI coding harness. Drives any model (frontier or open-weights) through a structured pilot loop over Puppetmaster durable state, with CodeGraph-aware retrieval, a portable cross-session wiki, and multi-worker delegation. Internal-first.
Marionette
A desktop AI coding harness where the LLM is a component inside the kernel,
not the platform. Marionette drives any model -- frontier or cheap open-weights --
through a structured pilot loop over Puppetmaster
durable state, with CodeGraph-aware retrieval, a portable cross-session knowledge
wiki, and multi-worker delegation.
Internal-first research rig and daily-driver app. stdlib-only backend (urllib +
sqlite); Puppetmaster is the one real dependency, installed editable from a local
checkout.
Status: v0.6.x, deliberately pre-1.0. Vetted privately before any wider release.
What it is
Marionette is a three-pane desktop app (Electron + a stdlib Python backend over
SSE):
- Center -- the pilot loop. A conversational driver: you talk to it, it emits
structured tool calls (read, search, edit, run, delegate), and the harness
executes them in code. No "I will now call the tool" narrator tax. - Right -- durable state. Live CodeGraph stats, the portable wiki graph, and
the artifact feed from every action and delegated job. - Left -- workspace. Projects, git branches/worktrees, sessions (auto-named
from the first message), and the Puppetmaster job list.
The thesis: there is no single best model. Hot-swap the driver by task. A cheap
local model (e.g. qwen3-coder-30b via OpenRouter, cents per session) handles
inline work; heavier reasoning and multi-file changes delegate to Puppetmaster
workers the router selects.
Core capabilities
| Capability | What it does |
|---|---|
| Provider-native pilot | One driver, every OpenAI-compatible endpoint (OpenRouter or native). Frontier control models (Claude, GPT) and open-weights (GLM, DeepSeek, Kimi, Qwen, MiniMax) drive the same loop. |
| CodeGraph-first retrieval | Per-turn structural context is auto-injected (symbols, defs, call sites) before the model acts, so it leans on the graph instead of dumping whole files. Self-healing: the index detects edits, additions, and deletions and refreshes in the background. |
| Puppetmaster delegation | run_swarm (read-only analysis), run_implement (edit-capable worktree worker), run_parallel (concurrent waves). Heavy/multi-file work runs as durable, auditable jobs. |
| Portable LLM Wiki | Cross-session, cross-LLM durable memory. A local model structures a session digest into entity/concept/decision pages (the "backwards" orchestration) cheaply, then ingests them -- human-approved by default. |
| Vision on any model | Paste or drop a screenshot; a VLM sidecar (Gemini, OpenRouter fallback) transcribes it so even a non-vision driver "sees" the image. |
| Full-auto mode | Unattended objective pursuit bounded by an AutoBudget governor (max swarms / tokens / seconds / idle), with a non-bypassable command safety guard. |
| Command safety guard | In full-auto, irreversible/remote/escalating shell commands (recursive deletes, ssh/scp, curl-pipe-to-shell, force-push, sudo, disk writes, key exfil) are screened and blocked; interactive co-working is untouched. Configurable per-command timeout (default 120s; 0/off = unbounded for long sessions). |
Architecture
Electron renderer (React, three panes)
| window.harnessIPC (IPC bridge: getJSON / postJSON / SSE / upload)
v
stdlib Python backend (harness/server.py) -- SSE event stream
|
ConversationalSession (harness/conversation.py) -- the pilot loop
| | |
structured tools CodeGraph context Puppetmaster
(read/search/edit/run) (per-turn, self-heal) Orchestrator(store).run(...)
|
durable SwarmStore
(jobs, artifacts, events)
| Module | Role |
|---|---|
harness/conversation.py |
The pilot loop: prose + structured tool calls -> execute -> feed real results back. Yields events for the GUI. |
harness/server.py |
stdlib HTTP server streaming Session events over SSE; CodeGraph status, wiki graph, settings, upload, sessions. |
harness/command_policy.py |
Pure, stdlib-only: timeout resolution + danger classification for the full-auto guard. |
harness/wiki_orchestrator.py |
Local-model structuring of a session digest into wiki pages (PM-free, testable). |
harness/state.py |
DurableState: clean read layer over Puppetmaster's SwarmStore (jobs, artifacts, live events). |
harness/config.py |
Swappable driver, reach, budget. |
webapp/ |
Electron app (electron/ main + preload + IPC bridges) and the React renderer (src/). |
The research rig (heritage)
Marionette grew out of an eval that asked: which LLMs can drive Puppetmaster as
a native harness layer? That rig still lives in pmharness/ and stays green:
| Module | Role |
|---|---|
pmharness/intent.py |
DriverIntent contract + strict validator + lenient text->JSON parser. Pure; no PM dependency. |
pmharness/bridge.py |
Executes a validated intent against Puppetmaster's in-process Orchestrator (local adapter; deterministic, free). |
pmharness/drivers/ |
Driver protocol: StubDriver (offline oracle), OpenAICompatDriver, plus Anthropic/Gemini drivers. |
pmharness/scoring.py |
Deterministic, no LLM-as-judge: json_valid, schema_valid, action_correct, executed_ok, composite. |
pmharness/ledger.py |
Append-only SQLite record of every attempt. |
Driver eval grades driving, not working: swarm intents execute on
Puppetmaster's free local adapter so ground truth is deterministic and key-free.
Install and updates
Marionette is a real installed app that updates itself the way Hermes Desktop
does. Install it once from the DMG (drag to /Applications), and from then on
new releases download in the background and apply on the next relaunch -- no
script to run, no DMG to re-download per change. The status-bar update pill
shows when a new version is ready; click it (or just quit and reopen) and the
app relaunches on the new version. Cutting a release is described inRELEASING.md.
Contributors who want to hack on Marionette run it from a git checkout instead;
in that mode the same pill pulls + rebuilds the source in place (see below).
One-command setup on a fresh machine (macOS, Apple Silicon):
curl -fsSL https://raw.githubusercontent.com/professorpalmer/pm-harness/main/scripts/bootstrap.sh | bash
Then launch it for daily use (production renderer; the self-updater rebuilds
into this mode):
bash scripts/start.sh # or add: alias marionette='bash ~/pm-harness/scripts/start.sh'
Run it (contributor / dev)
Desktop app with Vite hot-reload for active editing:
cd webapp && npm install && npm run electron:dev
Or the one-command dev launcher (cleans stale processes, then launches):
bash scripts/dev.sh
The signed, auto-updating DMG is cut with scripts/release.sh -- seeRELEASING.md. That installed app is the primary channel for everyone; the git
checkout above is for people actively developing Marionette.
Research rig (offline, no keys):
python3 -m venv .venv && .venv/bin/pip install -e /path/to/Puppetmaster pytest
.venv/bin/python -m pytest -q # full suite, offline
.venv/bin/python scripts/run_eval.py --drivers stub-oracle # offline oracle ceiling
Configuration
The driver and keys are set in the app (Settings pane) or via env. Key vars:
| Env var | Purpose |
|---|---|
OPENROUTER_API_KEY |
Default reach: the whole field through one endpoint. |
GEMINI_API_KEY |
Vision sidecar (image transcription); OpenRouter VLM fallback. |
HARNESS_DRIVER |
Pilot model id. |
HARNESS_COMMAND_TIMEOUT |
Per-command shell timeout in seconds; 0/off = unbounded. |
HARNESS_AUTO_COMMAND_GUARD |
Full-auto danger guard; default on, off to disable. |
HARNESS_WIKI_ORCHESTRATE |
Local wiki structuring: unset (off), 1/approve (prepare-and-approve), auto (silent ingest). |
HARNESS_AUTO_MAX_SWARMS / _TOKENS / _SECONDS / _MAX_IDLE |
Full-auto budget governor ceilings. |
Conventions
- No emojis or decorative pictographs anywhere (code, docs, commits, output).
- stdlib-only backend; Puppetmaster is the single real dependency.
pmharness/intent.pyandharness/command_policy.pystay PM-free and pure so
they unit-test fast and hermetically.- Scoring is deterministic -- no LLM-as-judge.
- Tests before claiming done:
.venv/bin/python -m pytest -q. - Never commit keys or
results/*.sqlite.
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi