pilot — Your AI Agent, Inside Your Real Browser

Your AI agent controls a tab in your real Chrome — already logged in, no bots blocked, no CAPTCHAs.

pilot demo

Other browser tools launch a separate headless browser. Your agent starts anonymous, gets blocked by Cloudflare, can't access anything behind login.

pilot takes a different approach: it controls a tab in the browser you're already using. Your agent sees what you see — logged into GitHub, Linear, Notion, your internal tools. No cookie hacks. No re-authentication. No bot detection.

Quick Start

1. Install pilot

npx pilot-mcp
npx playwright install chromium

Add to .mcp.json (Claude Code) or MCP settings (Cursor):

{
  "mcpServers": {
    "pilot": {
      "command": "npx",
      "args": ["-y", "pilot-mcp"]
    }
  }
}

2. Install the Chrome extension

npx pilot-mcp --install-extension

This opens Chrome's extensions page and shows the folder path. Click Load unpacked → paste the path. You'll see the ✈️ Pilot icon — badge shows ON when connected.

3. Use it

Tell your agent:

"Go to my GitHub notifications and summarize them"

The agent navigates in a real Chrome tab — already logged in as you. No setup. No cookies. No Cloudflare blocks.

Two Modes

Extension Mode — your real browser

The Pilot Chrome extension connects to the MCP server via WebSocket. Your agent gets its own tab in your real browser — with all your sessions, cookies, and logged-in state already there.

AI Agent → MCP (stdio) → pilot → WebSocket → Chrome Extension → Your Browser Tab

No Cloudflare blocks (real browser fingerprint)
Already authenticated everywhere
Multiple agents get separate tabs (multiplexed)
You can watch the agent work in real-time

This is how pilot is meant to be used.

Headed Mode — visible Chromium

When the extension isn't connected, pilot opens a visible Chromium window. You can see everything the agent does and intervene when needed.

Import cookies from your real browser to authenticate:

pilot_import_cookies({ browser: "chrome", domains: [".github.com", ".linear.app"] })

Supports Chrome, Arc, Brave, Edge, Comet via macOS Keychain / Linux libsecret.

When the agent hits a CAPTCHA or bot wall, it hands control to you:

pilot_handoff — pauses automation, you solve the challenge
pilot_resume — agent continues where it left off

Lean Snapshots

Large page snapshots eat context windows. pilot is opinionated about keeping things small:

Navigate returns a ~2K char preview, not a 50K+ page dump
Snapshot supports max_elements, interactive_only, lean, structure_only
Snapshot diff shows only what changed — no redundant re-reads

Other tools:   navigate(58K) → navigate(58K) → answer        = 116K chars
pilot:         navigate(2K)  → navigate(2K)  → snapshot(9K)  =  13K chars

Less context = faster inference, cheaper API calls, fewer failures.

pilot vs @playwright/mcp

Both are solid tools. Here's what's actually different:

	pilot	@playwright/mcp
Real browser control	Extension controls a tab in your Chrome	Extension for session reuse (no DOM control)
Bot detection	Not an issue (real browser) + handoff/resume	❌ blocked by Cloudflare
Cookie import	Decrypt from Chrome, Arc, Brave, Edge, Comet	❌ (manual `--storage-state` JSON)
Default snapshot size	~2K on navigate, ~9K full snapshot	~50-60K on navigate
Snapshot diffing	`pilot_snapshot_diff`	❌
Token control	`max_elements`, `interactive_only`, `lean`, `structure_only`	`--snapshot-mode` (incremental/full/none)
Iframe support	`pilot_frames`, `pilot_frame_select`, `pilot_frame_reset`	❌
Ad blocking	`pilot_block` with `ads` preset	`--blocked-origins` (manual)
Tool profiles	`core` (9) / `standard` (30) / `full` (61)	Capability groups via `--caps`
Transport	stdio	stdio, HTTP, SSE
Persistent sessions	`pilot_auth` + cookie import	`--user-data-dir`, `--storage-state`
Network interception	`pilot_intercept`	`browser_route`
Assertions	`pilot_assert`	Verify tools via `--caps=testing`

Use pilot when: You need your agent to work on authenticated sites, you want lean context, or you're tired of Cloudflare blocks.

Use @playwright/mcp when: You need HTTP/SSE transport, Windows auth support, or you prefer Microsoft's ecosystem.

Tool Profiles

61 tools is too many for most LLMs — research shows degradation past ~30. Load only what you need:

Profile	Tools	Use case
`core`	9	Simple automation — navigate, snapshot, click, fill, type, press_key, wait, screenshot
`standard`	30	Common workflows — core + tabs, scroll, hover, drag, iframes, auth, block, find
`full`	61	Everything, including network mocking, assertions, clipboard, geolocation

{
  "mcpServers": {
    "pilot": {
      "command": "npx",
      "args": ["-y", "pilot-mcp"],
      "env": { "PILOT_PROFILE": "standard" }
    }
  }
}

Default is standard (30 tools).

All Tools (61)

Navigation

Tool	Description
`pilot_get`	Navigate and return full readable content + interactive elements in one call
`pilot_navigate`	Navigate to a URL. Returns content preview + interactive elements (~2K chars)
`pilot_back`	Go back in browser history
`pilot_forward`	Go forward in browser history
`pilot_reload`	Reload the current page

Snapshots

Tool	Description
`pilot_snapshot`	Accessibility tree with `@eN` refs. Supports `max_elements`, `structure_only`, `interactive_only`, `lean`, `compact`, `depth`
`pilot_snapshot_diff`	Unified diff showing what changed since last snapshot
`pilot_find`	Find element by visible text, label, or role — returns a ref without a full snapshot
`pilot_annotated_screenshot`	Screenshot with red boxes at each `@ref` position

Interaction

Tool	Description
`pilot_click`	Click by `@ref` or CSS selector
`pilot_hover`	Hover over an element
`pilot_fill`	Clear and fill an input/textarea
`pilot_select_option`	Select a dropdown option
`pilot_type`	Type text character by character
`pilot_press_key`	Press keyboard keys
`pilot_drag`	Drag from one element to another
`pilot_scroll`	Scroll element or page
`pilot_wait`	Wait for element, network idle, or page load
`pilot_file_upload`	Upload files to a file input

Iframes

Tool	Description
`pilot_frames`	List all iframes
`pilot_frame_select`	Switch context into an iframe
`pilot_frame_reset`	Switch back to main frame

Page Inspection

Tool	Description
`pilot_page_text`	Clean text extraction
`pilot_page_html`	Get innerHTML of element or full page
`pilot_page_links`	All links as text + href pairs
`pilot_page_forms`	All form fields as structured JSON
`pilot_page_attrs`	All attributes of an element
`pilot_page_css`	Computed CSS property value
`pilot_element_state`	Check visible/hidden/enabled/disabled/checked/focused
`pilot_page_diff`	Text diff between two URLs

Debugging

Tool	Description
`pilot_console`	Console messages from circular buffer
`pilot_network`	Network requests from circular buffer
`pilot_dialog`	Captured alert/confirm/prompt messages
`pilot_evaluate`	Run JavaScript on the page
`pilot_cookies`	Get all cookies as JSON
`pilot_storage`	Get localStorage/sessionStorage
`pilot_perf`	Page load performance timings

Visual

Tool	Description
`pilot_screenshot`	Screenshot of page or element
`pilot_pdf`	Save page as PDF
`pilot_responsive`	Screenshots at mobile, tablet, desktop

Tabs

Tool	Description
`pilot_tabs`	List open tabs
`pilot_tab_new`	Open a new tab
`pilot_tab_close`	Close a tab
`pilot_tab_select`	Switch to a tab

Session & Auth

Tool	Description
`pilot_import_cookies`	Import cookies from Chrome, Arc, Brave, Edge, Comet via Keychain decryption
`pilot_auth`	Save/load/clear full session state (cookies + localStorage + sessionStorage)
`pilot_set_cookie`	Set a cookie manually
`pilot_set_header`	Set custom request headers
`pilot_set_useragent`	Set user agent string
`pilot_handle_dialog`	Configure dialog auto-accept/dismiss
`pilot_resize`	Set viewport size
`pilot_block`	Block requests by URL pattern or `ads` preset
`pilot_geolocation`	Set fake GPS coordinates
`pilot_cdp`	Connect to a real Chrome instance via CDP
`pilot_extension_status`	Check Chrome extension connection status
`pilot_handoff`	Open headed Chrome for manual interaction (CAPTCHA, auth)
`pilot_resume`	Resume automation after handoff
`pilot_close`	Close browser and clean up

Automation (full profile)

Tool	Description
`pilot_intercept`	Intercept requests and return custom responses
`pilot_assert`	Assert URL, text, element state, or value
`pilot_clipboard`	Read or write clipboard content

Extension Architecture

The Pilot extension uses a broker/client model — multiple AI sessions share one extension, each getting its own tab:

Claude Code Session A ──┐
                        ├→ pilot broker (ws://127.0.0.1:3131) → Chrome Extension → Tab 1
Claude Code Session B ──┘                                                       → Tab 2

Each session's tab is color-grouped in Chrome so you can see which tab belongs to which agent.

Requirements

Node.js >= 18
Chrome + Pilot extension (recommended)
macOS or Linux (for cookie import in headed mode)
Chromium: npx playwright install chromium (for headed mode)

Security

Variable	Default	Description
`PILOT_PROFILE`	`standard`	Tool set: `core` (9), `standard` (30), or `full` (61)
`PILOT_OUTPUT_DIR`	System temp	Restricts where screenshots/PDFs can be written

Extension communicates over localhost WebSocket only (127.0.0.1)
Output path validation prevents writing outside PILOT_OUTPUT_DIR
Path traversal protection on all file operations
Expression size limit (50KB) on pilot_evaluate

Development

npm test   # unit tests via vitest

Credits

The core browser automation architecture — ref-based element selection, snapshot diffing, cursor-interactive scanning, annotated screenshots, circular buffers, and AI-friendly error translation — is ported from gstack by Garry Tan.

Built on Playwright by Microsoft and the Model Context Protocol SDK by Anthropic.

If pilot is useful to you, star the repo — it helps others find it.