pixelpi
Health Gecti
- License — License: MIT
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 11 GitHub stars
Code Basarisiz
- network request — Outbound network request in bench/tasks.ts
- eval() — Dynamic code execution via eval() in examples/primitives-only.ts
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
A minimal browser-agent harness - six tools, any model. pi for the browser.
pixelpi
A minimal browser-agent harness. Six tools, raw CDP, any model.
The page is the prompt.
npm i -g pixelpi
pixelpi "find the top story on Hacker News": the agent opens a real Chrome, looks once, and reports the title in a few steps. No Playwright, no vision model, no cloud.
Every other browser agent buries the model under a 20–30-tool MCP surface and a raw-DOM firehose. pixelpi gives it six primitives and a bounded view of the page. A heavy page that costs ~180k tokens as raw DOM, pixelpi hands the model in ~2k. That's 37× to 100× fewer tokens across real sites, and it stays flat as the page grows. The model already knows how to use a browser; pixelpi just gets out of the way.
If pixelpi saves you a 30-tool MCP install, a star helps others find it.
Install
npm i -g pixelpi # the CLI
pixelpi # first run → guided setup, then an interactive chat
Quickstart
npm i -g pixelpi # 1. install the global binary
pixelpi auth # 2. set provider + key (or: export ANTHROPIC_API_KEY=…)
pixelpi "find the top story on Hacker News and store its title" # 3. run a task
First run with no config drops you into guided setup (provider · key · model), then an interactive browser-agent chat. pixelpi --json "…" emits NDJSON for scripts.
Sessions and login
Every run uses a fresh, disposable Chrome profile by default (logged out). To stay logged in across runs, use a persistent profile:
pixelpi login https://github.com # opens a real Chrome; sign in, press Enter to save
pixelpi --profile "check my GitHub notifications" # reuses the saved session, headless
--profileuses~/.pixelpi/profile;--profile=<dir>uses a custom one (handy for separate accounts).- Omit
--profilefor a fresh disposable profile each run. - Chrome locks a profile dir, so don't run two tasks against the same profile at once.
pixelpi finds Chrome automatically on macOS, Linux, and Windows. Set PIXELPI_CHROME=/path/to/chrome to override.
Record and replay
Save a solved run as a trace and replay it later with no model in the loop. The first run is the compile step; every replay is the binary: free, deterministic, and fast.
pixelpi "find the top story on Hacker News" --record hn-top # solve once, save a trace
pixelpi replay hn-top # rerun it with no model, 0 tokens
pixelpi replay hn-top --heal # repair one step with the model if the page drifted
- Traces key on the accessibility role and name of each element, not CSS selectors or coordinates, so they survive most layout churn. A bare name lives in
~/.pixelpi/traces/; pass a path (or a name ending in.json) to keep a trace inside a repo. --recordwrites only when the run completes. Omit the name and it auto-slugs the task.- Strict
replayneeds no API key. On drift it stops and exits3, naming the step that no longer matches.--healre-derives just that step with the model and rewrites the trace, so it self-corrects over time.
Replay reproduces actions, not intent: it is for stable, repeated flows (a login, an export, a scrape). --heal is what reintroduces judgment when a page has genuinely changed.
Run over a dataset
A parametrized trace is a function: record it once with example inputs, then run it across a list of inputs in parallel, with zero-token replay per row.
pixelpi "search Hacker News for rust" --record hn # then name "rust" as an input (q)
pixelpi run hn --query rust # one input
pixelpi run hn --over queries.csv --concurrency 8 # map over a CSV/JSONL, in parallel
- Record naturally and pixelpi offers to turn the values you entered into named inputs, or declare them up front with
--param q=rust.pixelpi vars hnre-opens the interactive naming any time. --overtakes a.csv(header row = input names) or.jsonl. Each row runs the trace with its values substituted, in its own headless Chrome, bounded by--concurrency(default 4). Outcomes stream to a JSONL file (--out), and--resumeskips rows already done.- The first row runs alone as a warm-up. With
--healit repairs structural drift once and every other row replays the fix for free, so a 5,000-row job costs one model run plus, at most, a handful of repairs. pixelpi run hnwith no input prompts for each one, using the recorded example as the default.
This shines for homogeneous, repeated flows. A row whose page genuinely differs surfaces as a per-row drift outcome, not a wrong result.
Describe a trace (for humans and agents)
Every trace is an introspectable function. describe shows its inputs and output:
pixelpi describe hn # human card: task, inputs, output, usage
pixelpi describe hn --json # {"type":"description","params":[...],"output":{...}}
Under --json, every command emits one NDJSON stream (progress, results, and errors as {"type":"error","code":...}), so an agent can drive pixelpi and parse a single clean contract.
The six primitives
look · act · fill · nav · eval · store
look(mode?, filter?): compact, ref-indexed accessibility/DOM snapshot. Theread.act(ref, op, value?): mutate the page by stable ref via trusted CDP input events. Thewrite/edit.fill(fields[]): batched form fill in one call.nav(action, arg?): navigate, tabs,waitfor. Thecd/ processes.eval(fn, args?, opts?): arbitrary JS in the page realm. The escape hatch, thebashof the browser.store(action, key?, value?): durable host-side JSON KV. The filesystem.
Elements are addressed by stable ref (not CSS/coordinates): cheap, deterministic, resilient to layout churn. Everything else is composable from eval; the agent writes its own higher-level tools as JSON skills at runtime, and only each skill's one-line description enters the prompt.
Why it's different
| pixelpi | Playwright MCP | Chrome DevTools MCP | |
|---|---|---|---|
| Tools in context | 6 | 21 | 31 |
| Tool-def + prompt tokens | ~1,055 | ~13,700 | ~18,000 |
| Page representation | a11y tree (bounded) | mixed | mixed |
| Substrate | raw CDP (no Playwright) | Playwright | CDP |
| Self-extension | agent writes JS skills at runtime | no | no |
| Replay | record once, replay with 0 tokens | no | no |
| Parallel fan-out | record once, run a whole dataset for ~0 tokens/row | no | no |
Token cost: look() vs a raw-DOM dump, measured across the 15 sites WebVoyager tests on (full table + script in bench/):
| Site | look() |
raw DOM | factor |
|---|---|---|---|
| Coursera | 1,997 tok | 202,892 tok | 101.6× |
| GitHub | 1,955 tok | 146,787 tok | 75.1× |
| Apple | 2,254 tok | 96,507 tok | 42.8× |
| Hugging Face | 1,932 tok | 45,300 tok | 23.4× |
| ArXiv | 1,588 tok | 10,652 tok | 6.7× |
37× to 100× fewer tokens across these sites (37× median). look() holds ~2k tokens whatever the page weighs, while the raw DOM keeps growing. Five of the twelve bot-block headless Chrome and return an empty page; bench/ has the full run. Reproduce it yourself: pnpm bench:tokens, no key needed.
SDK usage
Drive the full agent loop from code:
import { createBrowserAgentSession } from "pixelpi";
const session = await createBrowserAgentSession({
task: "extract all job listings from https://example.co/careers into JSON",
launch: { headless: true },
});
try {
const result = await session.run();
console.log(result.finalText);
} finally {
await session.close();
}
Or use the six primitives directly against raw CDP, no model in the loop:
import { launchChrome, createBrowserTools } from "@josharsh/pixelpi-cdp";
import { MemoryStore } from "@josharsh/pixelpi-core";
const { session, close } = await launchChrome({ headless: true, startUrl: "https://news.ycombinator.com" });
const [look, , , , evalJs] = createBrowserTools({ session, store: new MemoryStore() });
const ctx = { signal: new AbortController().signal, emit: () => {} };
console.log((await look.execute({}, ctx)).content); // compact a11y snapshot
console.log((await evalJs.execute({ fn: "return document.title" }, ctx)).content);
await close();
Or load a saved trace as a callable function, no model or API key required:
import { loadTrace } from "pixelpi";
const hn = loadTrace("hn"); // by name (home library) or path
console.log(hn.describe()); // { params, output, ... }
const r = await hn({ query: "rust" }); // run once -> { ok, output }
const rs = await hn.over(rows, { concurrency: 4 }); // map over a dataset, results in input order
More in examples/.
Philosophy
The model is the harness now, so you expose the substrate's irreducible primitives and let the agent compose the rest. See docs/how-it-works.md for the moving parts (why six tools, why raw CDP, why no MCP).
Contributing
Issues and PRs welcome. Run pnpm install && pnpm build && pnpm test before opening a PR. See CONTRIBUTING.md.
Status
Substrate (look/eval) is validated live against real sites. The agent loop, guards, stores, and provider adapters are unit-tested (216 tests, mock provider, no network in tests). The full LLM↔browser loop runs once you supply an API key. Requires Node ≥ 20 and Google Chrome (macOS, Linux, or Windows).
License
MIT © 2026 Harsh Joshi
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi