Auto Browser

Auto Browser demo

Give your AI agent a real browser — with a human in the loop.

Open-source MCP-native browser agent for authorized workflows.

Works with:

Claude Desktop
Cursor
any MCP client that can speak JSON-RPC tools
direct REST callers when you want curl-first control

Why Auto Browser?

MCP-native, not bolted on later. Use it from Claude Desktop, Cursor, or any MCP client.
Human takeover when the web gets weird. noVNC lets you recover from brittle flows without losing the session.
Login once, reuse later. Save named auth profiles and reopen fresh sessions already signed in.

If you want one clean mental model, this repo is:

browser agent as an MCP server

If Auto Browser is useful, a ⭐ helps others find it.

3-command quickstart

git clone https://github.com/LvcidPsyche/auto-browser.git
cd auto-browser
docker compose up --build

That works with zero config for local dev.

Optional sanity check:

make doctor

make doctor needs local Docker access and the ability to open localhost sockets.

Open:

API docs: http://localhost:8000/docs
Operator Dashboard: http://localhost:8000/ui/
Visual takeover: http://localhost:6080/vnc.html?autoconnect=true&resize=scale

All published ports bind to 127.0.0.1 by default.

Only copy .env.example if you want to change ports, providers, or allowed hosts:

cp .env.example .env

To see the rest of the common commands:

make help

What’s new in v0.5.1

Maintenance release — no API changes, all fixes are backwards compatible.

network_inspector pending leak fixed — in-flight requests are now flushed as failed when a session is detached (tab close, crash), preventing unbounded memory growth
Global KeyError → 404 handler — all store-layer KeyError raises are now handled uniformly; ~30 route handlers simplified
_WithApproval mixin — 9 social action models and UploadRequest no longer repeat approval_id: str | None = None
_MarkInterruptedMixin — mark_all_active_interrupted extracted from the three session store classes that each had identical copies
utils.utc_now() — shared ISO-8601 timestamp helper; _timestamp() removed from 5 modules
tool_inputs.py — Pydantic input models split from tool_gateway.py (dispatch logic vs. schema definitions)
create_session decomposed — 190-line method split into 4 focused private helpers
agent_jobs.py cleanup — dead hasattr guard deleted; enqueue_step/enqueue_run merged

All 149 tests pass.

What’s new in v0.5.0

CDP Connect Mode — attach to an existing Chrome via --remote-debugging-port instead of launching a new one
Network Inspector — per-session request/response capture with header masking and PII scrubbing
PII Scrubbing Layer — 16 pattern classes (AWS keys, JWTs, credit cards, SSNs, emails…); pixel redaction on screenshots; console + network body scrubbing
Proxy Partitioning — named proxy personas for per-agent static IPs, preventing shared network footprints
Shadow Browsing — flip a headless session to a headed (visible) browser mid-run for live debugging
Session Forking — branch a session’s auth state (cookies + storage) into a new independent session
Playwright Script Export — GET /sessions/{id}/export-script downloads the session as runnable Python
Shared Session Links — HMAC-signed, TTL-enforced observer tokens for team handoffs
Vision-Grounded Targeting — browser.find_by_vision uses Claude Vision to locate elements by natural language description
Cron + Webhook Triggers — APScheduler-backed autonomous jobs; HMAC webhook keys; full CRUD at /crons
MCP Resources Protocol — resources/list + resources/read expose live screenshot, DOM, console, and network log as MCP resources
30+ new MCP tools — eval_js, get_html, find_elements, drag_drop, set_viewport, cookies/storage R/W, and more

See CHANGELOG.md for the full list.

What’s included

a browser node with Chromium, Xvfb, x11vnc, and noVNC
a controller API built on FastAPI + Playwright
screen-aware observations with screenshots and interactable element IDs
optional OCR excerpts from screenshots via Tesseract
human takeover through noVNC
artifact capture for screenshots, traces, and storage state
optional encrypted auth-state storage with max-age enforcement on restore
reusable named auth profiles for login-once, reuse-later workflows
basic policy rails with host allowlists and upload approval gates
durable session metadata under /data/sessions, with optional Redis backing
durable agent job records under /data/jobs with background workers for queued step/run requests
audit events with per-request operator identity headers
optional SQLite backing for approvals + audit events
optional built-in REST agent runner for OpenAI, Claude, and Gemini
one-step and multi-step REST agent orchestration endpoints
richer browser abilities through the shared action schema: hover, select_option, wait, reload, back, forward
tab awareness and tab controls for popup-heavy workflows
download capture with session-scoped files and URLs under /artifacts
optional session-level proxy routing and custom user agents for controlled network paths
social page helpers for feed scrolling, post/profile extraction, search, and approval-gated write actions
a browser-node managed Playwright server endpoint so the controller connects over Playwright protocol instead of CDP
optional docker-ephemeral per-session browser isolation with dedicated noVNC ports
a real MCP JSON-RPC transport at /mcp, plus convenience endpoints at /mcp/tools + /mcp/tools/call
CDP connect mode — attach to an existing Chrome instance instead of launching a new one
network inspector — per-session request/response capture with PII scrubbing and header masking
PII scrubbing layer — 16 pattern classes with Pillow pixel redaction on screenshots
proxy partitioning — named proxy personas for per-agent static IP assignment
shadow browsing — flip headless → headed mid-run for live visual debugging
session forking — clone auth state into a new independent session branch
Playwright script export — download any session as a runnable .py file
shared session links — HMAC-signed, TTL-bound observer tokens
vision-grounded targeting — Claude Vision locates elements by natural language
cron + webhook triggers — autonomous scheduled browser jobs via APScheduler
MCP Resources Protocol — live screenshot, DOM, console, network as browser:// resources
30+ MCP tools — eval_js, get_html, find_elements, drag_drop, cookies/storage R/W, and more

It is intentionally not a stealth or anti-bot system. It is for operator-assisted browser workflows on sites and accounts you are authorized to use.

Good fits

internal dashboards and admin tools
agent-assisted QA and browser debugging
login-once, reuse-later account workflows
export/download/report flows
brittle sites where a human may need to step in
MCP-powered agent workflows that need a real browser

Not the goal

anti-bot bypass
CAPTCHA solving
stealth/evasion work
unauthorized scraping or account automation

Architecture at a glance

flowchart LR
    User[Human operator] -->|watch / takeover| noVNC[noVNC]
    LLM[OpenAI / Claude / Gemini] -->|shared tools| Controller[Controller API]
    Controller -->|Playwright protocol| Browser[Browser node]
    noVNC --> Browser
    Browser --> Artifacts[(screenshots / traces / auth state)]
    Controller --> Artifacts
    Controller --> Policy[Allowlist + approval gates]

See:

docs/architecture.md for the full design
docs/llm-adapters.md for the model-facing action loop
docs/mcp-clients.md for MCP client integration notes
docs/production-hardening.md for the production target/spec
docs/deployment.md for the deployment and credential handoff checklist
docs/good-first-issues.md for contributor-friendly starter work
examples/README.md for curl-first examples
ROADMAP.md for project direction
CODE_OF_CONDUCT.md for community expectations
CONTRIBUTING.md if you want to help

Quick demo flow

The fastest way to understand the project:

create a session
observe the page
take over visually if needed
save an auth profile
reopen a new session from that saved profile

That flow is what makes the project actually useful in day-to-day work.

If you want the shortest copy-paste curl walkthrough for that pattern, start with:

examples/login-and-save-profile.md

Real demo flow

The simplest high-signal demo for this project is:

log into Outlook once
save the browser state as outlook-default
open a fresh session from auth_profile: "outlook-default"
continue work without reauthing

That is the clearest example of why this is more useful than plain browser automation.

MCP usage

Auto Browser exposes a real MCP transport at:

/mcp

It also exposes convenience tool endpoints at:

/mcp/tools
/mcp/tools/call

That means you can use it as:

a local browser tool server for MCP clients
a supervised browser backend for agent frameworks
a plain REST API if you want to script it directly

The differentiator is not just “browser automation.”
The differentiator is a browser agent that is already packaged as an MCP server.

MCP transport modes

HTTP MCP server at http://127.0.0.1:8000/mcp
stdio bridge at scripts/mcp_stdio_bridge.py

Most MCP clients still default to stdio. Auto Browser now ships the bridge out of the box, so you do not need a separate compatibility layer.

Claude Desktop quickstart

Copy examples/claude_desktop_config.json and replace <ABSOLUTE_PATH_TO_AUTO_BROWSER> with your real clone path:

{
  "mcpServers": {
    "auto-browser": {
      "command": "python3",
      "args": [
        "<ABSOLUTE_PATH_TO_AUTO_BROWSER>/scripts/mcp_stdio_bridge.py"
      ],
      "env": {
        "AUTO_BROWSER_BASE_URL": "http://127.0.0.1:8000/mcp",
        "AUTO_BROWSER_BEARER_TOKEN": ""
      }
    }
  }
}

Then:

start Auto Browser with docker compose up --build
optional manual bridge command: make stdio-bridge
paste that config into Claude Desktop
restart Claude Desktop
use the auto-browser MCP server through stdio

Tool surface

The default MCP tool profile exposes 32 tools covering:

session lifecycle, navigation, observation
click, type, hover, scroll, select, drag-drop, eval JS
screenshot, DOM access, cookies, local/session storage
network log inspection, console log access
auth profiles, proxy personas, session forking
vision-grounded element targeting
cron job management, shared session links
Playwright script export, shadow browsing

Internal queue/provider/admin tools are hidden by default.

If you want the entire internal tool surface, set:

MCP_TOOL_PROFILE=full

Why this is free

Auto Browser is designed to be free to use because it is:

open-source
self-hosted
local-first
bring-your-own browser/runtime
bring-your-own model/provider

There is no required hosted control plane in the core project.

One-command readiness check

For a quick VPS sanity check before a live session:

make doctor

Run it from a normal terminal or any shell that has local Docker/localhost access.

For host-side controller tests instead of Docker:

python3.11 -m pip install -e ./controller[dev]
make test-local

Host-side controller workflows use Python 3.11+.

For a fuller pre-release pass that validates docs, compose config, tests, and the live smoke:

make release-audit

That script:

picks alternate local ports automatically if 8000, 6080, or 5900 are already occupied
waits for /readyz
prints provider readiness
runs a real create-session + observe smoke
runs one agent-step smoke when the chosen provider is configured
loads the repo-local .env so ambient shell secrets do not accidentally override tonight's config

If you also want it to rebuild the images first:

DOCTOR_BUILD=1 make doctor

If you are using OPENAI_AUTH_MODE=host_bridge, make sure the Codex bridge is already running first.

If you want the controller API itself protected, set API_BEARER_TOKEN and send:

Authorization: Bearer <token>

Optional operator headers:

X-Operator-Id: alice
X-Operator-Name: Alice Example

Set REQUIRE_OPERATOR_ID=true if every non-health request must carry an operator ID.

Production-mode minimums

For a real private beta, set at least:

APP_ENV=production
API_BEARER_TOKEN=<strong-random-secret>
REQUIRE_OPERATOR_ID=true
AUTH_STATE_ENCRYPTION_KEY=<44-char-fernet-key>
REQUIRE_AUTH_STATE_ENCRYPTION=true
REQUEST_RATE_LIMIT_ENABLED=true
METRICS_ENABLED=true

The controller now fails closed on startup in production mode if the required security settings are missing.

Provider auth modes

By default the controller talks to vendor APIs directly with API keys.

If you already use subscription-backed CLIs instead, Auto Browser can route provider decisions through:

codex for OpenAI
claude for Anthropic / Claude Code
gemini for Gemini CLI

Set the auth modes explicitly:

OPENAI_AUTH_MODE=cli
CLAUDE_AUTH_MODE=cli
GEMINI_AUTH_MODE=cli
CLI_HOME=/data/cli-home

Then populate data/cli-home with the auth caches from the machine where those CLIs are already signed in:

mkdir -p data/cli-home
rsync -a ~/.codex data/cli-home/.codex
cp ~/.claude.json data/cli-home/.claude.json
rsync -a ~/.claude data/cli-home/.claude
rsync -a ~/.gemini data/cli-home/.gemini

If you just want to sign in interactively on this host, use the included bootstrap helper instead. It is meant for the default writable /data/... auth-cache flow and opens the CLI inside the controller image with HOME=$CLI_HOME (normally /data/cli-home), so the login state lands exactly where Auto Browser expects it:

./scripts/bootstrap_cli_auth.sh codex
./scripts/bootstrap_cli_auth.sh claude
./scripts/bootstrap_cli_auth.sh gemini
# or
./scripts/bootstrap_cli_auth.sh all

If this box already has those subscription logins locally, the smoother path is to mount the real host homes read-only at their native paths instead of copying caches around:

CLI_HOST_HOME=/home/youruser \
OPENAI_AUTH_MODE=cli \
CLAUDE_AUTH_MODE=cli \
GEMINI_AUTH_MODE=cli \
docker compose -f docker-compose.yml -f docker-compose.host-subscriptions.yml up --build

That override:

mounts ~/.codex, ~/.claude, ~/.claude.json, and ~/.gemini read-only
sets CLI_HOME to the host-style home path inside the container
behaves much more like running the CLIs directly on the host

If your host home is not /home/youruser, set CLI_HOST_HOME first. Do not use bootstrap_cli_auth.sh in this mode; sign in on the host first and then start the override.

If Codex subscription auth still does not survive inside Docker cleanly, use the host-side bridge instead. It runs codex on the host and exposes a Unix socket through the shared ./data mount:

mkdir -p data/host-bridge
python3 scripts/codex_host_bridge.py --socket-path data/host-bridge/codex.sock

If you want it to behave more like a persistent host skill, install the included user-service template once:

mkdir -p ~/.config/systemd/user
cp ops/systemd/codex-host-bridge.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now codex-host-bridge.service

Then start the controller with:

OPENAI_AUTH_MODE=host_bridge \
OPENAI_HOST_BRIDGE_SOCKET=/data/host-bridge/codex.sock \
docker compose up --build

That gives OpenAI/Codex the closest behavior to a host-side skill, because the actual CLI stays on the host instead of inside the container.

Notes:

the bridge socket is now health-checked, not just path-checked
host codex requests are killed after 55s by default so the bridge does not leak orphaned CLI jobs
the bridge is a local trust boundary: anyone who can talk to that Unix socket can make the host run codex exec
keep data/host-bridge private to trusted local users/processes only
keep data/cli-home private; it contains live auth material
API keys are still the better default for CI/public automation
CLI auth is aimed at trusted single-tenant boxes like your VPS + Tailscale setup

If you want true per-session browser isolation, use the compose override:

docker compose -f docker-compose.yml -f docker-compose.isolation.yml up --build

That keeps the default shared browser-node available, but new sessions are provisioned as one-off browser containers with their own noVNC ports when SESSION_ISOLATION_MODE=docker_ephemeral.
Raise MAX_SESSIONS above 1 if you want multiple isolated sessions live at once.
The existing reverse-SSH sidecar still only tunnels the controller API plus the shared browser-node noVNC port.
If isolated session noVNC ports are only bound locally, enable the controller-managed ISOLATED_TUNNEL_* settings to open a reverse-SSH tunnel per session.
If you already have direct host reachability, set ISOLATED_TAKEOVER_HOST to a host humans can actually reach and skip the extra tunnel broker.
When the controller brokers an isolated-session tunnel, it targets the per-session browser container over the Docker network by default instead of hairpinning back through a host-published port.

For remote access, you now have two sane paths:

put the stack behind Tailscale / Cloudflare Access
run the optional reverse-SSH sidecar and point TAKEOVER_URL at the forwarded noVNC URL

If 8000, 6080, or 5900 are already taken on the host, override them inline:

API_PORT=8010 NOVNC_PORT=6081 VNC_PORT=5901 \
TAKEOVER_URL='http://127.0.0.1:6081/vnc.html?autoconnect=true&resize=scale' \
docker compose up --build

Shared action schema and download API

Beyond the convenience routes (/actions/click, /actions/type, etc.), the controller now exposes:

POST /sessions/{session_id}/actions/execute
- accepts the full shared BrowserActionDecision schema
- supports hover, select_option, wait, reload, go_back, and go_forward
GET /sessions/{session_id}/tabs
- lists the currently open pages in the session
POST /sessions/{session_id}/tabs/activate
- makes a tab the primary page for future observations/actions
POST /sessions/{session_id}/tabs/close
- closes a tab by index and rebinds the session to the active tab
GET /sessions/{session_id}/downloads
- lists files captured for that session
- download files are saved under the session artifact tree and served from /artifacts/...

Reverse SSH remote access

This repo now includes an optional reverse-ssh profile that forwards:

controller API 8000 -> remote port REVERSE_SSH_REMOTE_API_PORT
noVNC 6080 -> remote port REVERSE_SSH_REMOTE_NOVNC_PORT

Setup:

mkdir -p data/ssh data/tunnels
chmod 700 data/ssh
cp ~/.ssh/id_ed25519 data/ssh/id_ed25519
chmod 600 data/ssh/id_ed25519
ssh-keyscan -p 22 bastion.example.com > data/ssh/known_hosts

Then set these in .env:

REVERSE_SSH_HOST=bastion.example.com
REVERSE_SSH_USER=browserbot
REVERSE_SSH_PORT=22
REVERSE_SSH_REMOTE_BIND_ADDRESS=127.0.0.1
REVERSE_SSH_REMOTE_API_PORT=18000
REVERSE_SSH_REMOTE_NOVNC_PORT=16080
REVERSE_SSH_ACCESS_MODE=private
TAKEOVER_URL=http://bastion.example.com:16080/vnc.html?autoconnect=true&resize=scale

Start it:

docker compose --profile reverse-ssh up --build

Notes:

default remote bind is 127.0.0.1 on the SSH server. That is safer.
the sidecar refuses non-local reverse binds unless REVERSE_SSH_ALLOW_NONLOCAL_BIND=true.
REVERSE_SSH_ACCESS_MODE=private is the default. That means bastion-only unless you front it with Tailscale or Cloudflare Access.
REVERSE_SSH_ACCESS_MODE=cloudflare-access expects REVERSE_SSH_PUBLIC_SCHEME=https.
non-local reverse binds are only allowed in REVERSE_SSH_ACCESS_MODE=unsafe-public. That is intentionally loud because GatewayPorts exposure is easy to get wrong.
the sidecar writes connection metadata to data/tunnels/reverse-ssh.json.
the sidecar refreshes that metadata on a heartbeat, and the controller marks stale tunnel metadata as inactive.

Run the local reverse-SSH smoke test

This repo includes a self-contained smoke harness with a disposable SSH bastion container:

./scripts/smoke_reverse_ssh.sh

If 8000 is busy on the host, run the smoke with an override like API_PORT=8010 ./scripts/smoke_reverse_ssh.sh.

It verifies:

controller /remote-access
forwarded API through the bastion
forwarded noVNC through the bastion
session create + observe through the forwarded API

Run the local isolated-session smoke test

This repo also includes a smoke harness for per-session docker isolation:

./scripts/smoke_isolated_session.sh

If the default controller port is busy, run API_PORT=8010 ./scripts/smoke_isolated_session.sh.

It verifies:

controller readiness with the isolation override enabled
session create in docker_ephemeral mode
dedicated per-session noVNC port wiring
session-scoped remote_access metadata
observe + close flow
isolated browser container cleanup after close

Run the local isolated-session tunnel smoke test

This repo also includes a smoke harness for controller-managed reverse tunnels on isolated session takeover ports:

./scripts/smoke_isolated_session_tunnel.sh

If the default controller port is busy, run API_PORT=8010 ./scripts/smoke_isolated_session_tunnel.sh.

It verifies:

controller-managed isolated session tunnel provisioning against the disposable bastion
session-specific remote-access payloads flipping to active
remote noVNC reachability from the bastion on the assigned per-session port
isolated tunnel teardown on session close

Check configured model providers

curl -s http://localhost:8000/agent/providers | jq

Each provider entry reports:

configured
auth_mode (api or cli)
model
detail with the concrete readiness reason or missing prerequisite

Inspect active remote-access metadata

curl -s http://localhost:8000/remote-access | jq
curl -s 'http://localhost:8000/remote-access?session_id=<session-id>' | jq

If the reverse-SSH sidecar is running, observations and session summaries will automatically return the forwarded takeover_url from data/tunnels/reverse-ssh.json.
For isolated sessions, the remote_access payload becomes session-specific so you can see whether that session’s own noVNC URL is still local-only, directly reachable, or being served through a controller-managed session tunnel.

Create a session

curl -s http://localhost:8000/sessions \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"name":"demo","start_url":"https://example.com"}' | jq

Observe the page

curl -s http://localhost:8000/sessions/<session-id>/observe | jq

The response includes:

current URL and title
a page-level text_excerpt
a compact dom_outline with headings, forms, and element counts
an accessibility_outline distilled from Playwright’s accessibility tree
an ocr payload with screenshot text excerpts and bounding boxes
a screenshot path and artifact URL
interactable elements with observation-scoped element_id values
recent console errors
the effective noVNC takeover URL
remote-access metadata when a tunnel sidecar is active
explicit isolation metadata, including per-session auth/upload roots and the shared-browser-node limit

Click by `element_id`

curl -s http://localhost:8000/sessions/<session-id>/actions/click \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"element_id":"op-abc123"}' | jq

Type into an input

curl -s http://localhost:8000/sessions/<session-id>/actions/type \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"selector":"input[name=q]","text":"playwright mcp","clear_first":true}' | jq

For secrets, set sensitive=true so action logs redact the typed preview:

curl -s http://localhost:8000/sessions/<session-id>/actions/type \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"selector":"input[type=password]","text":"super-secret","clear_first":true,"sensitive":true}' | jq

For passwords, OTPs, or other secrets, set sensitive: true so action logs redact the typed value preview:

curl -s http://localhost:8000/sessions/<session-id>/actions/type \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"element_id":"op-password","text":"super-secret","clear_first":true,"sensitive":true}' | jq

Hover over an element

curl -s http://localhost:8000/sessions/<session-id>/actions/hover \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"selector":"#dropdown-trigger"}' | jq

Use coordinates instead: {"x": 640, "y": 360}

Select a dropdown option

curl -s http://localhost:8000/sessions/<session-id>/actions/select-option \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"selector":"select#size","value":"large"}' | jq

Also accepts label (visible text) or index (0-based position).

Wait, reload, and navigate history

# Wait 1.5 seconds
curl -s http://localhost:8000/sessions/<session-id>/actions/wait \
  -X POST -H 'content-type: application/json' -d '{"wait_ms":1500}' | jq

# Reload the current page
curl -s http://localhost:8000/sessions/<session-id>/actions/reload \
  -X POST | jq

# Browser back / forward
curl -s http://localhost:8000/sessions/<session-id>/actions/go-back  -X POST | jq
curl -s http://localhost:8000/sessions/<session-id>/actions/go-forward -X POST | jq

Save auth state for later reuse

curl -s http://localhost:8000/sessions/<session-id>/storage-state \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"path":"demo-auth.json"}' | jq

That path is now saved under the session’s own auth root:

/data/auth/<session-id>/demo-auth.json

If AUTH_STATE_ENCRYPTION_KEY is set, the controller saves:

/data/auth/<session-id>/demo-auth.json.enc

Restores enforce AUTH_STATE_MAX_AGE_HOURS, so stale auth-state files are rejected instead of silently reused.

Inspect the current auth-state metadata:

curl -s http://localhost:8000/sessions/<session-id>/auth-state | jq

Save a reusable auth profile

Auth profiles live under /data/auth/profiles/<profile-name>/ and are not cleaned up by routine retention jobs.

curl -s http://localhost:8000/sessions/<session-id>/auth-profiles \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"profile_name":"outlook-default"}' | jq

List saved profiles:

curl -s http://localhost:8000/auth-profiles | jq
curl -s http://localhost:8000/auth-profiles/outlook-default | jq

Start a new session from a saved profile:

curl -s http://localhost:8000/sessions \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"name":"outlook-resume","auth_profile":"outlook-default","start_url":"https://outlook.live.com/mail/0/"}' | jq

Outlook login + save workflow

This is the simplest pattern for “human login once, then reuse later”.

curl -s http://localhost:8000/sessions \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"name":"outlook-login","start_url":"https://login.live.com/"}' | jq

Then log in and save the profile in one step:

curl -s http://localhost:8000/sessions/<session-id>/social/login \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "platform":"outlook",
    "username":"[email protected]",
    "password":"REDACTED",
    "auth_profile":"outlook-default"
  }' | jq

If Microsoft throws a human verification wall, use the returned takeover_url, finish the challenge manually in noVNC, then save the profile:

curl -s http://localhost:8000/sessions/<session-id>/auth-profiles \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"profile_name":"outlook-default"}' | jq

Save a reusable auth profile

Per-session auth-state files are good for debugging. Named auth profiles are better for repeat runs.

Save the current browser context as a reusable profile:

curl -s http://localhost:8000/sessions/<session-id>/auth-profiles \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"profile_name":"outlook-default"}' | jq

List saved profiles:

curl -s http://localhost:8000/auth-profiles | jq

Start a new session from a saved profile:

curl -s http://localhost:8000/sessions \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"name":"outlook-mail","start_url":"https://outlook.live.com/mail/0/","auth_profile":"outlook-default"}' | jq

Saved auth profiles live under:

/data/auth/profiles/<profile-name>/

The maintenance cleaner treats /data/auth/profiles as persistent state, so reusable profiles are not pruned like stale session artifacts.

Outlook login + save-session workflow

If you already own the mailbox and just need a reusable logged-in session:

Create a session at https://login.live.com/
Run POST /sessions/<id>/social/login with:
- "platform": "outlook"
- "username": "<mailbox>"
- "password": "<password>"
- optional "auth_profile": "outlook-default"
If Microsoft shows CAPTCHA or “press and hold”, switch to the session takeover_url
When login completes, reuse the saved auth profile in future sessions

Example:

curl -s http://localhost:8000/sessions/<session-id>/social/login \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"platform":"outlook","username":"[email protected]","password":"...","auth_profile":"outlook-default"}' | jq

Stage upload files

This POC expects upload files to be staged on disk first:

cp ~/Downloads/example.pdf data/uploads/

For cleaner isolation, you can also stage per-session files under:

data/uploads/<session-id>/

Then request and execute approval through the queue:

curl -s http://localhost:8000/sessions/<session-id>/actions/upload \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"selector":"input[type=file]","file_path":"example.pdf"}' | jq

That returns 409 with a pending approval payload. Then:

curl -s http://localhost:8000/approvals/<approval-id>/approve \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"comment":"approved"}' | jq

curl -s http://localhost:8000/approvals/<approval-id>/execute \
  -X POST | jq

Inspect approvals

curl -s http://localhost:8000/approvals | jq
curl -s http://localhost:8000/approvals/<approval-id> | jq

Ask a provider for one next step

curl -s http://localhost:8000/sessions/<session-id>/agent/step \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "provider":"openai",
    "goal":"Open the main link on the page and stop.",
    "observation_limit":25
  }' | jq

Let a provider run a short loop

curl -s http://localhost:8000/sessions/<session-id>/agent/run \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "provider":"claude",
    "goal":"Fill the search field with playwright mcp and stop before submitting.",
    "max_steps":4
  }' | jq

If a model proposes an upload, post/send, payment, account change, or destructive step, the run now stops with status=approval_required and writes a queued approval item instead of executing the side effect.

Queue agent work for background execution

curl -s http://localhost:8000/sessions/<session-id>/agent/jobs/step \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "provider":"openai",
    "goal":"Inspect the page and stop."
  }' | jq

curl -s http://localhost:8000/sessions/<session-id>/agent/jobs/run \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "provider":"claude",
    "goal":"Open the first result and summarize it.",
    "max_steps":4
  }' | jq

curl -s http://localhost:8000/agent/jobs | jq
curl -s http://localhost:8000/agent/jobs/<job-id> | jq

Queued jobs are persisted under /data/jobs. If the controller restarts mid-run, any previously running jobs are marked interrupted on startup instead of disappearing.

Audit trail and operator identity

curl -s http://localhost:8000/operator | jq
curl -s 'http://localhost:8000/audit/events?limit=20' | jq
curl -s 'http://localhost:8000/audit/events?session_id=<session-id>' | jq

Audit events are written to /data/audit/events.jsonl.

If STATE_DB_PATH is set, approvals and audit events are also stored in SQLite and served from there. AUDIT_MAX_EVENTS caps retained audit rows/events in both SQLite and the mirrored JSONL file.

Metrics and cleanup

curl -s http://localhost:8000/metrics | head
curl -s http://localhost:8000/maintenance/status | jq

curl -s http://localhost:8000/maintenance/cleanup \
  -X POST \
  -H "Authorization: Bearer <token>" \
  -H "X-Operator-Id: ops" | jq

The controller can now:

expose Prometheus-style request/session metrics at /metrics
prune stale artifacts, uploads, and auth-state files on startup and on a configurable interval

If METRICS_ENABLED=false, /metrics returns 404.

MCP browser gateway

Convenience endpoints still exist:

curl -s http://localhost:8000/mcp/tools | jq

curl -s http://localhost:8000/mcp/tools/call \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "name":"browser.observe",
    "arguments":{"session_id":"<session-id>","limit":20}
  }' | jq

The controller now also exposes a real MCP-style JSON-RPC session transport at /mcp:

INIT=$(curl -si http://localhost:8000/mcp \
  -X POST \
  -H 'content-type: application/json' \
  -d '{
    "jsonrpc":"2.0",
    "id":1,
    "method":"initialize",
    "params":{
      "protocolVersion":"2025-11-25",
      "clientInfo":{"name":"demo-client","version":"0.1.0"},
      "capabilities":{}
    }
  }')

SESSION_ID=$(printf "%s" "$INIT" | awk -F": " '/^MCP-Session-Id:/ {print $2}' | tr -d '\r')

curl -s http://localhost:8000/mcp \
  -X POST \
  -H "content-type: application/json" \
  -H "MCP-Session-Id: $SESSION_ID" \
  -H "MCP-Protocol-Version: 2025-11-25" \
  -d '{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}'

curl -s http://localhost:8000/mcp \
  -X POST \
  -H "content-type: application/json" \
  -H "MCP-Session-Id: $SESSION_ID" \
  -H "MCP-Protocol-Version: 2025-11-25" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | jq

Notes:

this transport supports initialize, notifications/initialized, ping, tools/list, tools/call, and DELETE /mcp session teardown
JSON-RPC batching is intentionally rejected
if a browser client sends an Origin header, set MCP_ALLOWED_ORIGINS to the exact allowed origins

Project layout

auto-browser/
├── browser-node/        # headed Chromium + noVNC image
├── controller/          # FastAPI + Playwright control plane
├── data/                # artifacts, uploads, auth state, durable session/job records, profile data
├── reverse-ssh/         # optional autossh sidecar for private remote access
├── docker-compose.yml
├── docker-compose.isolation.yml
└── docs/
    ├── architecture.md
    └── llm-adapters.md

Opinionated defaults

Keep Playwright as the execution engine.
Use screenshots + DOM/interactable metadata together.
Use noVNC/xpra-style takeover when a flow gets brittle.
Use one session per account/workflow.
Never automate with your daily browser profile.
Keep one active session per browser node in this POC because takeover is tied to one visible desktop.
If you need parallel sessions, switch to docker_ephemeral isolation so each live session gets its own browser container and takeover port.
Keep a durable session registry even in the POC so restarts downgrade active sessions to interrupted instead of losing them.
Treat each session’s auth/upload roots as isolated working state even though the visible desktop is still shared.
Encrypt auth-state at rest once you move beyond localhost demos.
Require operator IDs once more than one human or worker touches the system.

Production upgrades after the POC

replace raw local ports with Tailscale, Cloudflare Access, or a hardened bastion
move session metadata from file/Redis into a richer Postgres model if you need querying and joins
promote the docker-ephemeral path into one browser pod per account once you want scheduler-level isolation
persist approvals in a database instead of flat files when the POC grows
add per-operator identity / SSO on top of the approval queue
add SSE streaming on top of the current MCP JSON-RPC transport if you need server-pushed events

References

OpenAI Computer Use: https://developers.openai.com/api/docs/guides/tools-computer-use/
Playwright Trace Viewer: https://playwright.dev/docs/trace-viewer
Playwright BrowserType connect: https://playwright.dev/docs/api/class-browsertype
Chrome for Testing: https://developer.chrome.com/blog/chrome-for-testing
noVNC embedding: https://novnc.com/noVNC/docs/EMBEDDING.html

Provider environment variables

Set one or more providers before starting the stack:

API mode: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY
CLI mode: OPENAI_AUTH_MODE=cli, CLAUDE_AUTH_MODE=cli, GEMINI_AUTH_MODE=cli

The controller exposes provider readiness at GET /agent/providers.

Optional provider resilience knobs:

MODEL_MAX_RETRIES
MODEL_RETRY_BACKOFF_SECONDS

Optional durable session-store knobs:

SESSION_STORE_ROOT
REDIS_URL
SESSION_STORE_REDIS_PREFIX

Optional auth/audit/operator knobs:

AUDIT_ROOT
STATE_DB_PATH
AUDIT_MAX_EVENTS
MCP_ALLOWED_ORIGINS
SESSION_ISOLATION_MODE
ISOLATED_BROWSER_IMAGE
ISOLATED_BROWSER_CONTAINER_PREFIX
ISOLATED_BROWSER_WAIT_TIMEOUT_SECONDS
ISOLATED_BROWSER_KEEP_CONTAINERS
ISOLATED_BROWSER_BIND_HOST
ISOLATED_TAKEOVER_HOST
ISOLATED_TAKEOVER_SCHEME
ISOLATED_TAKEOVER_PATH
ISOLATED_BROWSER_NETWORK
ISOLATED_HOST_DATA_ROOT
ISOLATED_DOCKER_HOST
ISOLATED_TUNNEL_ENABLED
ISOLATED_TUNNEL_HOST
ISOLATED_TUNNEL_PORT
ISOLATED_TUNNEL_USER
ISOLATED_TUNNEL_KEY_PATH
ISOLATED_TUNNEL_KNOWN_HOSTS_PATH
ISOLATED_TUNNEL_STRICT_HOST_KEY_CHECKING
ISOLATED_TUNNEL_REMOTE_BIND_ADDRESS
ISOLATED_TUNNEL_REMOTE_PORT_START
ISOLATED_TUNNEL_REMOTE_PORT_END
ISOLATED_TUNNEL_SERVER_ALIVE_INTERVAL
ISOLATED_TUNNEL_SERVER_ALIVE_COUNT_MAX
ISOLATED_TUNNEL_INFO_INTERVAL_SECONDS
ISOLATED_TUNNEL_STARTUP_GRACE_SECONDS
ISOLATED_TUNNEL_ACCESS_MODE
ISOLATED_TUNNEL_PUBLIC_HOST
ISOLATED_TUNNEL_PUBLIC_SCHEME
ISOLATED_TUNNEL_LOCAL_HOST
ISOLATED_TUNNEL_INFO_ROOT
AUTH_STATE_ENCRYPTION_KEY
REQUIRE_AUTH_STATE_ENCRYPTION
AUTH_STATE_MAX_AGE_HOURS
OCR_ENABLED
OCR_LANGUAGE
OCR_MAX_BLOCKS
OCR_TEXT_LIMIT
OPERATOR_ID_HEADER
OPERATOR_NAME_HEADER
REQUIRE_OPERATOR_ID

Auto Browser

Why Auto Browser?

3-command quickstart

What’s new in v0.5.1

What’s new in v0.5.0

What’s included

Good fits

Not the goal

Architecture at a glance

Quick demo flow

Real demo flow

MCP usage

MCP transport modes

Claude Desktop quickstart

Tool surface

Why this is free

One-command readiness check

Production-mode minimums

Provider auth modes

Shared action schema and download API

Reverse SSH remote access

Run the local reverse-SSH smoke test

Run the local isolated-session smoke test

Run the local isolated-session tunnel smoke test

Check configured model providers

Inspect active remote-access metadata

Create a session

Observe the page

Click by element_id

Type into an input

Hover over an element

Select a dropdown option

Wait, reload, and navigate history

Save auth state for later reuse

Save a reusable auth profile

Outlook login + save workflow

Save a reusable auth profile

Outlook login + save-session workflow

Stage upload files

Inspect approvals

Ask a provider for one next step

Let a provider run a short loop

Queue agent work for background execution

Audit trail and operator identity

Metrics and cleanup

MCP browser gateway

Project layout

Opinionated defaults

Production upgrades after the POC

References

Provider environment variables

Reviews (0)

Click by `element_id`