bridgic-browser

agent
SUMMARY

LLM-driven browser automation library built on Playwright with 67 CLI/SDK tools, stable snapshot refs, and stealth mode.基于 Playwright 的 LLM 驱动浏览器自动化库,提供 67 个 CLI/SDK 工具、稳定快照引用(ref)与默认隐身模式,适用于 AI Agent 端到端网页操作。

README.md

English | 中文


Bridgic Browser

Bridgic Browser is a Python library for LLM-driven browser automation built on Playwright. It includes CLI tools, Python tools and skills for AI agents.

Features

  • Comprehensive CLI Tools - 67 tools organized into 15 categories; Designed to integrate with any AI agent
  • Python-based Tools - Used for agent / workflow code generation; Easier integration with Bridgic
  • Snapshot with Semantic Invariance - A representation of page snapshot based on accessibility tree and a specially designed ref-generation algorithm that ensures element refs remain unchanged across page reloads
  • Skills - Used for guided exploration and code generation; Compatible with most of coding agents
  • Stealth Mode (Enabled by Default) - Mode-aware anti-detection: 50+ Chrome args + JS patches in headless mode; minimal ~11 flags in headed mode to match real Chrome fingerprint
  • Dual Launch Mode - Automatically switches between isolated sessions and persistent contexts
  • Nested iframe Support - Supports DOM element operations within multi-level nested iframes

Installation

pip install bridgic-browser

After installation, install Playwright browsers:

playwright install chromium

Quick Start

CLI Tolls Usage

bridgic-browser open --headed https://example.com
bridgic-browser snapshot
# 'f0201d1c' is the ref value of the 'Learn more' link
bridgic-browser click f0201d1c
bridgic-browser screenshot page.png
bridgic-browser close

Python Tolls Integration

First, build tools:

from bridgic.browser.session import Browser
from bridgic.browser.tools import BrowserToolSetBuilder, ToolCategory

# create a browser instance
browser = Browser(headless=False)

async def create_tools(browser):
    # Build a focused tool set for your agent
    builder = BrowserToolSetBuilder.for_categories(
        browser,
        ToolCategory.NAVIGATION,
        ToolCategory.SNAPSHOT,
        ToolCategory.ELEMENT_INTERACTION,
        ToolCategory.CAPTURE,
        ToolCategory.WAIT,
    )
    tools = builder.build()["tool_specs"]
    return tools

Second (optional), build a Bridgic agent that uses this tool set:

import os
from bridgic.llms.openai import OpenAILlm, OpenAIConfiguration
async def create_llm():
    _api_key = os.environ.get("OPENAI_API_KEY")
    _model_name = os.environ.get("OPENAI_MODEL_NAME")

    llm = OpenAILlm(
        api_key=_api_key,
        configuration=OpenAIConfiguration(model=_model_name),
        timeout=60,
    )
    return llm

from bridgic.core.agentic.recent import ReCentAutoma, StopCondition
from bridgic.core.automa import RunningOptions
async def create_agent(llm, tools):
    browser_agent = ReCentAutoma(
        llm=llm,
        tools=tools,
        stop_condition=StopCondition(max_iteration=10, max_consecutive_no_tool_selected=1),
        running_options=RunningOptions(debug=True),
    )
    return browser_agent

async def main():
    tools = await create_tools(browser)
    llm = await create_llm()
    agent = await create_agent(llm, tools)
    result = await agent.arun(
        goal=(
            "Summarize the 'Learn more' page of example.com for me"
        ),
        guidance=(
            "Do the following steps one by one:\n"
            "1. Navigate to https://example.com\n"
            "2. Click the 'Learn more' link\n"
            "3. Take a screenshot of the 'Learn more' page\n"
            "4. Summarize the page content in one sentence and tell me how to access the screenshot.\n"
        ),
    )
    print("\n\n*** Final Result: ***\n\n")
    print(result)

    await browser.close()

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

How to Install Skills?

The skills of this repo can work with most of coding agents / AI assistant, such as Claude Code, Cursor, OpenClaw...
Install using the npx skills CLI:

# From this repository checkout
npx skills add . --skill bridgic-browser

# Or from GitHub
npx skills add bitsky-tech/bridgic-browser --skill bridgic-browser

After installation, the Skill will appear in your project’s agent directories (for example, Claude Code typically under .claude/skills/bridgic-browser/, and Cursor under .agents/skills/bridgic-browser/).

Browser API Usage

You can also directly call the underlying Browser API to control the browser.

from bridgic.browser.session import Browser

browser = Browser(headless=False)

async def main():
    await browser.navigate_to("https://example.com")
    snapshot = await browser.get_snapshot()
    print(snapshot.tree)  # Tree format: - role "name" [ref=f0201d1c]
    for ref, data in snapshot.refs.items():
        if data.name == "Learn more":
            learn_more_ref = ref
            break
    print(f"Found ref for 'Learn more': {learn_more_ref}")
    await browser.click_element_by_ref(learn_more_ref)
    await browser.take_screenshot(filename="page.png")
    await browser.close()

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

CLI Tools

bridgic-browser ships with a command-line interface for controlling a browser from the terminal (67 tools organized into 15 categories). A persistent daemon process holds a browser instance; each CLI invocation connects over a Unix domain socket and exits immediately.

Configuration

Browser options are read at daemon startup from the following sources, in priority order (highest last wins):

Source Example
Defaults headless=True
~/.bridgic/bridgic-browser.json User-level persistent config
./bridgic-browser.json Project-local config (in cwd at daemon start)
Environment variables See skills/bridgic-browser/references/env-vars.md

Headed browser note:
When headless=false and stealth is enabled, bridgic auto-switches to system Chrome
(if installed) for better anti-detection (Chrome for Testing is blocked by Google OAuth).
To override, set:

  • channel: e.g. ”chrome”, ”msedge”
  • executable_path: absolute path to a browser binary

The JSON sources accept any Browser constructor parameter:

{
  "headless": false,
  "proxy": {"server": "http://proxy:8080", "username": "u", "password": "p"},
  "viewport": {"width": 1280, "height": 720},
  "locale": "zh-CN",
  "timezone_id": "Asia/Shanghai"
}
# One-shot env override
BRIDGIC_BROWSER_JSON='{"headless":false,"locale":"zh-CN"}' bridgic-browser open URL

Command List

Category Commands
Navigation open, back, forward, reload, search, info
Snapshot snapshot [-i] [-f|-F] [-o N] [-l N]
Element Interaction click, double-click, hover, focus, fill, select, options, check, uncheck, scroll-to, drag, upload, fill-form
Keyboard press, type, key-down, key-up
Mouse scroll, mouse-move, mouse-click, mouse-drag, mouse-down, mouse-up
Wait wait [SECONDS] [TEXT] [--gone]
Tabs tabs, new-tab, switch-tab, close-tab
Evaluate eval, eval-on
Capture screenshot, pdf
Network network-start, network-stop, network, wait-network
Dialog dialog-setup, dialog, dialog-remove
Storage storage-save, storage-load, cookies-clear, cookies, cookie-set
Verify verify-visible, verify-text, verify-value, verify-state, verify-url, verify-title
Developer console-start, console-stop, console, trace-start, trace-stop, trace-chunk, video-start, video-stop
Lifecycle close, resize

Use -h or --help on any command for details:

bridgic-browser -h
bridgic-browser scroll -h

Python Tools

Bridgic Browser provides 67 tools organized into 15 categories. Use BrowserToolSetBuilder with category/name selection for scenario-focused tool sets.

Category-based Selection

from bridgic.browser.tools import BrowserToolSetBuilder, ToolCategory

# Focused set for your specific agent flows
builder = BrowserToolSetBuilder.for_categories(
    browser,
    ToolCategory.NAVIGATION,
    ToolCategory.ELEMENT_INTERACTION,
    ToolCategory.CAPTURE,
)
tools = builder.build()["tool_specs"]

# Include all available tools
builder = BrowserToolSetBuilder.for_categories(browser, ToolCategory.ALL)
tools = builder.build()["tool_specs"]

Name-based Selection (by function name)

# Select by tool function names
builder = BrowserToolSetBuilder.for_tool_names(
    browser,
    "search",
    "navigate_to",
    "click_element_by_ref",
)
tools = builder.build()["tool_specs"]

# Enable strict mode to catch typos and missing browser methods early
builder = BrowserToolSetBuilder.for_tool_names(
    browser,
    "search",
    "navigate_to",
    strict=True,
)
tools = builder.build()["tool_specs"]

Mixed Selection

builder1 = BrowserToolSetBuilder.for_categories(
    browser,
    ToolCategory.NAVIGATION,
    ToolCategory.ELEMENT_INTERACTION,
    ToolCategory.CAPTURE,
)
builder2 = BrowserToolSetBuilder.for_tool_names(
    browser, "verify_url", "verify_title"
)
tools = [*builder1.build()["tool_specs"], *builder2.build()["tool_specs"]]

Tool List

Navigation (6 tools):

  • navigate_to(url) - Navigate to URL
  • search(query, engine) - Search using search engine
  • get_current_page_info() - Get current page info (URL, title, etc.)
  • reload_page() - Reload current page
  • go_back() / go_forward() - Browser history navigation

Snapshot (1 tool):

  • get_snapshot_text(offset=0, limit=10000, interactive=False, full_page=True) - Get page state string for LLM (accessibility tree with refs). offset must be >= 0 and is used for pagination when the page is long: if the return value is truncated, a [notice] before the page content gives next_offset to call again. limit (default 10000) controls the maximum characters returned. interactive and full_page match get_snapshot (interactive-only or full-page by default).

Element Interaction (13 tools) - by ref:

  • click_element_by_ref(ref) - Click element
  • input_text_by_ref(ref, text) - Input text
  • fill_form(fields) - Fill multiple form fields
  • scroll_element_into_view_by_ref(ref) - Scroll element into view
  • select_dropdown_option_by_ref(ref, value) - Select dropdown option
  • get_dropdown_options_by_ref(ref) - Get dropdown options
  • check_checkbox_or_radio_by_ref(ref) / uncheck_checkbox_by_ref(ref) - Checkbox control
  • focus_element_by_ref(ref) - Focus element
  • hover_element_by_ref(ref) - Hover over element
  • double_click_element_by_ref(ref) - Double click
  • upload_file_by_ref(ref, path) - Upload file
  • drag_element_by_ref(start_ref, end_ref) - Drag and drop

Tabs (4 tools):

  • get_tabs() / new_tab(url) / switch_tab(page_id) / close_tab(page_id) - Tab management

Evaluate (2 tools):

  • evaluate_javascript(code) - Execute JavaScript
  • evaluate_javascript_on_ref(ref, code) - Execute JavaScript on element

Keyboard (4 tools):

  • type_text(text) - Type text character by character (key events, no ref — acts on focused element)
  • press_key(key) - Press keyboard shortcut (e.g. "Enter", "Control+A")
  • key_down(key) / key_up(key) - Key control

Mouse (6 tools) - Coordinate-based:

  • mouse_wheel(delta_x, delta_y) - Scroll wheel
  • mouse_click(x, y) - Click at position
  • mouse_move(x, y) - Move mouse
  • mouse_drag(start_x, start_y, end_x, end_y) - Drag operation
  • mouse_down() / mouse_up() - Mouse button control

Wait (1 tool):

  • wait_for(time_seconds, text, text_gone, selector, state, timeout) - Wait for conditions

Capture (2 tools):

  • take_screenshot(filename=None, ref=None, full_page=False, type="png") - Capture screenshot
  • save_pdf(filename) - Save page as PDF

Network (4 tools):

  • start_network_capture() / stop_network_capture() / get_network_requests() - Network monitoring
  • wait_for_network_idle() - Wait for network idle

Dialog (3 tools):

  • setup_dialog_handler(default_action) - Set up auto dialog handler
  • handle_dialog(accept, prompt_text) - Handle dialog
  • remove_dialog_handler() - Remove dialog handler

Storage (5 tools):

  • get_cookies() / set_cookie() / clear_cookies() - Cookie management (expires=0 is valid and preserved)
  • save_storage_state(filename) / restore_storage_state(filename) - Session persistence

Verify (6 tools):

  • verify_text_visible(text) - Check text visibility
  • verify_element_visible(role, accessible_name) - Check element visibility by role and accessible name
  • verify_url(pattern) / verify_title(pattern) - URL/title verification
  • verify_element_state(ref, state) - Check element state
  • verify_value(ref, value) - Check element value

Developer (8 tools):

  • start_console_capture() / stop_console_capture() / get_console_messages() - Console monitoring
  • start_tracing() / stop_tracing() / add_trace_chunk() - Performance tracing
  • start_video() / stop_video() - Video recording

Lifecycle (2 tools):

  • close() - Close browser
  • browser_resize(width, height) - Resize viewport

CLI Tools -> Python Tools Mapping

CLI command SDK tool method
open navigate_to
search search
info get_current_page_info
reload reload_page
back go_back
forward go_forward
snapshot get_snapshot_text
click click_element_by_ref
fill input_text_by_ref
fill-form fill_form
scroll-to scroll_element_into_view_by_ref
select select_dropdown_option_by_ref
options get_dropdown_options_by_ref
check check_checkbox_or_radio_by_ref
uncheck uncheck_checkbox_by_ref
focus focus_element_by_ref
hover hover_element_by_ref
double-click double_click_element_by_ref
upload upload_file_by_ref
drag drag_element_by_ref
tabs get_tabs
new-tab new_tab
switch-tab switch_tab
close-tab close_tab
eval evaluate_javascript
eval-on evaluate_javascript_on_ref
press press_key
type type_text
key-down key_down
key-up key_up
scroll mouse_wheel
mouse-click mouse_click
mouse-move mouse_move
mouse-drag mouse_drag
mouse-down mouse_down
mouse-up mouse_up
wait wait_for
screenshot take_screenshot
pdf save_pdf
network-start start_network_capture
network get_network_requests
network-stop stop_network_capture
wait-network wait_for_network_idle
dialog-setup setup_dialog_handler
dialog handle_dialog
dialog-remove remove_dialog_handler
cookies get_cookies
cookie-set set_cookie
cookies-clear clear_cookies
storage-save save_storage_state
storage-load restore_storage_state
verify-text verify_text_visible
verify-visible verify_element_visible
verify-url verify_url
verify-title verify_title
verify-state verify_element_state
verify-value verify_value
console-start start_console_capture
console get_console_messages
console-stop stop_console_capture
trace-start start_tracing
trace-chunk add_trace_chunk
trace-stop stop_tracing
video-start start_video
video-stop stop_video
close close
resize browser_resize

Core Components

Browser

The main class for browser automation with automatic launch mode selection:

from bridgic.browser.session import Browser

# Isolated session (no persistence)
browser = Browser(
    headless=True,
    viewport={"width": 1600, "height": 900},
)

# Persistent session (with user data)
browser = Browser(
    headless=False,
    user_data_dir="./user_data",
    stealth=True,  # Enabled by default
)

Key Parameters:

Parameter Type Default Description
headless bool True Run in headless mode
viewport dict 1600x900 Browser viewport size
user_data_dir str/Path None Path for persistent context
stealth bool/StealthConfig True Stealth mode configuration
channel str None Browser channel (chrome, msedge, etc.)
proxy dict None Proxy settings
downloads_path str/Path None Download directory

Snapshot: Use get_snapshot(interactive=False, full_page=True) to get an EnhancedSnapshot with .tree (accessibility tree string) and .refs (ref → locator data). By default full_page=True includes all elements regardless of viewport position. Pass interactive=True for clickable/editable elements only (flattened output), or full_page=False to limit to viewport-only elements. Use get_element_by_ref(ref) to get a Playwright Locator from a ref (e.g. "1f79fe5e") for click, fill, etc.

StealthConfig

Configure stealth mode for bypassing bot detection:

from bridgic.browser.session import StealthConfig, Browser

# Custom stealth configuration
config = StealthConfig(
    enabled=True,
    disable_security=False,
)

browser = Browser(stealth=config, headless=False)

DownloadManager

Handle file downloads with proper filename preservation:

# Pass downloads_path to Browser — it creates and manages the DownloadManager internally
browser = Browser(downloads_path="./downloads", headless=True)
await browser.navigate_to("https://example.com")  # lazy start triggers here

# Access downloaded files via the built-in manager
for file in browser.download_manager.downloaded_files:
    print(f"Downloaded: {file.file_name} ({file.file_size} bytes)")

Stealth Mode

Stealth mode is enabled by default and includes:

  • Headless mode: 50+ Chrome args + JS init script patching navigator.webdriver, window.chrome, WebGL, document.hasFocus(), visibilityState, and more. All patched functions spoof Function.prototype.toString to return [native code].
  • Headed mode: minimal ~11 flags only (matching real Chrome); JS patches are skipped entirely so third-party challenge iframes (e.g. Cloudflare Turnstile) see unmodified native APIs.
# Stealth is ON by default
browser = Browser()  # stealth=True

# Disable stealth if needed
browser = Browser(stealth=False)

# Custom stealth settings
from bridgic.browser.session import create_stealth_config

config = create_stealth_config(
    disable_security=True,
)
browser = Browser(stealth=config)

Error Model

SDK and CLI share one structured error protocol.

  • Base type: BridgicBrowserError
  • Stable fields: code, message, details, retryable
  • Behavior subclasses:
    • InvalidInputError (invalid arguments/user input)
    • StateError (invalid runtime state, e.g. no active page/session)
    • OperationError (operation execution failures)
    • VerificationError (assertion/verification failures)

Why keep a small number of behavior subclasses:

  • Lets callers catch by behavior when needed (e.g. retry only StateError)
  • Encodes default retry semantics close to the failure source
  • Avoids a large, hard-to-maintain class hierarchy while keeping error handling predictable

Daemon protocol is also structured:

  • Success: {"success": true, "result": "..."}
  • Failure: {"success": false, "error_code": "...", "result": "...", "data": {...}, "meta": {"retryable": false}}

CLI client converts daemon failures into BridgicBrowserCommandError, and CLI output keeps machine code visible as Error[CODE]: ....

Requirements

  • Python 3.10+
  • Playwright 1.57+
  • Pydantic 2.11+

License

MIT License

More documentation

Yorumlar (0)

Sonuc bulunamadi