pixelpi

agent
Security Audit
Fail
Health Pass
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 11 GitHub stars
Code Fail
  • network request — Outbound network request in bench/tasks.ts
  • eval() — Dynamic code execution via eval() in examples/primitives-only.ts
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

A minimal browser-agent harness - six tools, any model. pi for the browser.

README.md

pixelpi

A minimal browser-agent harness. Six tools, raw CDP, any model.

The page is the prompt.

npm version
npm downloads
CI
license MIT

npm i -g pixelpi
pixelpi driving a real browser from the terminal

pixelpi "find the top story on Hacker News": the agent opens a real Chrome, looks once, and reports the title in a few steps. No Playwright, no vision model, no cloud.

Every other browser agent buries the model under a 20–30-tool MCP surface and a raw-DOM firehose. pixelpi gives it six primitives and a bounded view of the page. A heavy page that costs ~180k tokens as raw DOM, pixelpi hands the model in ~2k. That's 37× to 100× fewer tokens across real sites, and it stays flat as the page grows. The model already knows how to use a browser; pixelpi just gets out of the way.

If pixelpi saves you a 30-tool MCP install, a star helps others find it.

Install

npm i -g pixelpi   # the CLI
pixelpi            # first run → guided setup, then an interactive chat

Quickstart

npm i -g pixelpi                          # 1. install the global binary
pixelpi auth                              # 2. set provider + key (or: export ANTHROPIC_API_KEY=…)
pixelpi "find the top story on Hacker News and store its title"   # 3. run a task

First run with no config drops you into guided setup (provider · key · model), then an interactive browser-agent chat. pixelpi --json "…" emits NDJSON for scripts.

Sessions and login

Every run uses a fresh, disposable Chrome profile by default (logged out). To stay logged in across runs, use a persistent profile:

pixelpi login https://github.com          # opens a real Chrome; sign in, press Enter to save
pixelpi --profile "check my GitHub notifications"   # reuses the saved session, headless
  • --profile uses ~/.pixelpi/profile; --profile=<dir> uses a custom one (handy for separate accounts).
  • Omit --profile for a fresh disposable profile each run.
  • Chrome locks a profile dir, so don't run two tasks against the same profile at once.

pixelpi finds Chrome automatically on macOS, Linux, and Windows. Set PIXELPI_CHROME=/path/to/chrome to override.

Record and replay

Save a solved run as a trace and replay it later with no model in the loop. The first run is the compile step; every replay is the binary: free, deterministic, and fast.

pixelpi "find the top story on Hacker News" --record hn-top   # solve once, save a trace
pixelpi replay hn-top                                         # rerun it with no model, 0 tokens
pixelpi replay hn-top --heal                                  # repair one step with the model if the page drifted
  • Traces key on the accessibility role and name of each element, not CSS selectors or coordinates, so they survive most layout churn. A bare name lives in ~/.pixelpi/traces/; pass a path (or a name ending in .json) to keep a trace inside a repo.
  • --record writes only when the run completes. Omit the name and it auto-slugs the task.
  • Strict replay needs no API key. On drift it stops and exits 3, naming the step that no longer matches. --heal re-derives just that step with the model and rewrites the trace, so it self-corrects over time.

Replay reproduces actions, not intent: it is for stable, repeated flows (a login, an export, a scrape). --heal is what reintroduces judgment when a page has genuinely changed.

Run over a dataset

A parametrized trace is a function: record it once with example inputs, then run it across a list of inputs in parallel, with zero-token replay per row.

pixelpi "search Hacker News for rust" --record hn   # then name "rust" as an input (q)
pixelpi run hn --query rust                          # one input
pixelpi run hn --over queries.csv --concurrency 8    # map over a CSV/JSONL, in parallel
  • Record naturally and pixelpi offers to turn the values you entered into named inputs, or declare them up front with --param q=rust. pixelpi vars hn re-opens the interactive naming any time.
  • --over takes a .csv (header row = input names) or .jsonl. Each row runs the trace with its values substituted, in its own headless Chrome, bounded by --concurrency (default 4). Outcomes stream to a JSONL file (--out), and --resume skips rows already done.
  • The first row runs alone as a warm-up. With --heal it repairs structural drift once and every other row replays the fix for free, so a 5,000-row job costs one model run plus, at most, a handful of repairs.
  • pixelpi run hn with no input prompts for each one, using the recorded example as the default.

This shines for homogeneous, repeated flows. A row whose page genuinely differs surfaces as a per-row drift outcome, not a wrong result.

Describe a trace (for humans and agents)

Every trace is an introspectable function. describe shows its inputs and output:

pixelpi describe hn            # human card: task, inputs, output, usage
pixelpi describe hn --json     # {"type":"description","params":[...],"output":{...}}

Under --json, every command emits one NDJSON stream (progress, results, and errors as {"type":"error","code":...}), so an agent can drive pixelpi and parse a single clean contract.

The six primitives

look · act · fill · nav · eval · store
  • look(mode?, filter?): compact, ref-indexed accessibility/DOM snapshot. The read.
  • act(ref, op, value?): mutate the page by stable ref via trusted CDP input events. The write/edit.
  • fill(fields[]): batched form fill in one call.
  • nav(action, arg?): navigate, tabs, waitfor. The cd / processes.
  • eval(fn, args?, opts?): arbitrary JS in the page realm. The escape hatch, the bash of the browser.
  • store(action, key?, value?): durable host-side JSON KV. The filesystem.

Elements are addressed by stable ref (not CSS/coordinates): cheap, deterministic, resilient to layout churn. Everything else is composable from eval; the agent writes its own higher-level tools as JSON skills at runtime, and only each skill's one-line description enters the prompt.

Why it's different

pixelpi Playwright MCP Chrome DevTools MCP
Tools in context 6 21 31
Tool-def + prompt tokens ~1,055 ~13,700 ~18,000
Page representation a11y tree (bounded) mixed mixed
Substrate raw CDP (no Playwright) Playwright CDP
Self-extension agent writes JS skills at runtime no no
Replay record once, replay with 0 tokens no no
Parallel fan-out record once, run a whole dataset for ~0 tokens/row no no

Token cost: look() vs a raw-DOM dump, measured across the 15 sites WebVoyager tests on (full table + script in bench/):

Site look() raw DOM factor
Coursera 1,997 tok 202,892 tok 101.6×
GitHub 1,955 tok 146,787 tok 75.1×
Apple 2,254 tok 96,507 tok 42.8×
Hugging Face 1,932 tok 45,300 tok 23.4×
ArXiv 1,588 tok 10,652 tok 6.7×

37× to 100× fewer tokens across these sites (37× median). look() holds ~2k tokens whatever the page weighs, while the raw DOM keeps growing. Five of the twelve bot-block headless Chrome and return an empty page; bench/ has the full run. Reproduce it yourself: pnpm bench:tokens, no key needed.

SDK usage

Drive the full agent loop from code:

import { createBrowserAgentSession } from "pixelpi";

const session = await createBrowserAgentSession({
  task: "extract all job listings from https://example.co/careers into JSON",
  launch: { headless: true },
});
try {
  const result = await session.run();
  console.log(result.finalText);
} finally {
  await session.close();
}

Or use the six primitives directly against raw CDP, no model in the loop:

import { launchChrome, createBrowserTools } from "@josharsh/pixelpi-cdp";
import { MemoryStore } from "@josharsh/pixelpi-core";

const { session, close } = await launchChrome({ headless: true, startUrl: "https://news.ycombinator.com" });
const [look, , , , evalJs] = createBrowserTools({ session, store: new MemoryStore() });
const ctx = { signal: new AbortController().signal, emit: () => {} };

console.log((await look.execute({}, ctx)).content);                 // compact a11y snapshot
console.log((await evalJs.execute({ fn: "return document.title" }, ctx)).content);
await close();

Or load a saved trace as a callable function, no model or API key required:

import { loadTrace } from "pixelpi";

const hn = loadTrace("hn");                          // by name (home library) or path
console.log(hn.describe());                          // { params, output, ... }
const r  = await hn({ query: "rust" });              // run once  -> { ok, output }
const rs = await hn.over(rows, { concurrency: 4 });  // map over a dataset, results in input order

More in examples/.

Philosophy

The model is the harness now, so you expose the substrate's irreducible primitives and let the agent compose the rest. See docs/how-it-works.md for the moving parts (why six tools, why raw CDP, why no MCP).

Contributing

Issues and PRs welcome. Run pnpm install && pnpm build && pnpm test before opening a PR. See CONTRIBUTING.md.

Status

Substrate (look/eval) is validated live against real sites. The agent loop, guards, stores, and provider adapters are unit-tested (216 tests, mock provider, no network in tests). The full LLM↔browser loop runs once you supply an API key. Requires Node ≥ 20 and Google Chrome (macOS, Linux, or Windows).

License

MIT © 2026 Harsh Joshi

Reviews (0)

No results found