servo-fetch

A self-contained browser engine that fetches, renders, and extracts web content. No Chrome, no API key, no setup.

servo-fetch embeds the Servo browser engine into a single binary. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content.

servo-fetch "https://example.com"                        # Clean Markdown
servo-fetch "https://example.com" --screenshot page.png  # PNG screenshot, no GPU needed
servo-fetch "https://example.com" --js "document.title"  # Run JS in the page
servo-fetch URL1 URL2 URL3                               # Parallel batch fetch
servo-fetch crawl "https://docs.example.com" --limit 20  # Crawl a site (BFS)

Why servo-fetch

Zero dependencies — single binary, no Chrome, no Docker, no API key
Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
Layout-aware extraction — strips navbars, sidebars, footers by actual rendered position, not HTML guessing
Parallel batch fetch — multiple URLs fetched concurrently, results stream as each completes
Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
Accessibility tree — AccessKit integration with roles, names, and bounding boxes

Performance

Parallel fetch — 4 URLs, JS executed, full CSS rendering:

Tool	Peak Memory	Time
servo-fetch	114 MB	1.5s
Playwright	502 MB	3.3s
Puppeteer	1065 MB	4.3s

Same rendering capabilities, 4–9× less memory, 2–3× faster. Methodology →

Install

curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh

Or via GitHub Releases, or with Cargo (requires Rust 1.86.0+):

cargo binstall servo-fetch   # prebuilt binary
cargo install servo-fetch    # build from source

Platform notes

Linux — runtime dependencies and headless setup

The Linux binary dynamically links against system libraries. Install them with:

# Debian/Ubuntu
sudo apt install -y libegl1 libfontconfig1 libfreetype6

# Fedora
sudo dnf install -y mesa-libEGL fontconfig freetype

# Arch
sudo pacman -S --needed mesa fontconfig freetype2

servo-fetch needs a working OpenGL ES context, so on headless servers (SSH/container) run it under a virtual display:

xvfb-run --auto-servernum servo-fetch "https://example.com"

Windows — zip layout

Windows releases ship as a .zip containing servo-fetch.exe alongside libEGL.dll and libGLESv2.dll — keep them in the same directory. Download from Releases, extract, and put the folder on your PATH.

macOS — no extra setup

No runtime dependencies. The release binary is ready to run.

Usage

Examples

# Readable Markdown (default)
servo-fetch "https://example.com"

# Structured JSON
servo-fetch "https://example.com" --json

# Multiple URLs in parallel (Markdown with separators)
servo-fetch "https://a.com" "https://b.com" "https://c.com"

# Multiple URLs as NDJSON (one compact JSON per line)
servo-fetch "https://a.com" "https://b.com" --json

# Screenshot — rendered to PNG without GPU
servo-fetch "https://example.com" --screenshot page.png

# Full-page screenshot (captures the entire scrollable page)
servo-fetch "https://example.com" --screenshot page.png --full-page

# Execute JavaScript in the page context
servo-fetch "https://example.com" --js "document.title"

# Extract a specific section by CSS selector
servo-fetch "https://example.com" --selector "article"

# Raw HTML or plain text (bypasses Readability)
servo-fetch "https://example.com" --raw html
servo-fetch "https://example.com" --raw text

# PDF text extraction (auto-detected via Content-Type)
servo-fetch "https://example.com/report.pdf"

# Crawl a site by following links (BFS, respects robots.txt)
servo-fetch crawl "https://docs.example.com" --limit 20 --max-depth 3

# Crawl with path filtering
servo-fetch crawl "https://docs.example.com" --include "/docs/**" --exclude "/docs/archive/**"

Options

Flag	Description
`--json`	Output as structured JSON (NDJSON when multiple URLs)
`--screenshot <FILE>`	Save a PNG screenshot (single URL only)
`--full-page`	Capture the full scrollable page (requires `--screenshot`)
`--js <EXPR>`	Execute JavaScript and print the result (single URL only)
`--selector <CSS>`	Extract a specific section by CSS selector
`--raw <MODE>`	Output raw `html` or plain `text` (single URL only)
`-t`, `--timeout <SECS>`	Page load timeout (default: 30)
`--settle <MS>`	Extra wait after load event for SPAs (default: 0, max: 10000)
`--help`	Show help
`--version`	Show version

When multiple URLs are given, they are fetched in parallel. Results stream to stdout in completion order — Markdown with --- URL --- separators by default, or NDJSON with --json.

Crawl subcommand

servo-fetch crawl <URL> follows links within the same site using BFS. Output is always NDJSON (one JSON object per page).

Flag	Description
`--limit <N>`	Maximum pages to crawl (default: 50)
`--max-depth <N>`	Maximum link depth from seed URL (default: 3)
`--include <GLOB>`	URL path patterns to include (e.g. `"/docs/**"`)
`--exclude <GLOB>`	URL path patterns to exclude
`--json`	Output content as JSON instead of Markdown per page
`--selector <CSS>`	Extract a specific section per page
`-t`, `--timeout <SECS>`	Per-page timeout (default: 30)
`--settle <MS>`	Extra wait after load event per page

Crawl respects robots.txt (RFC 9309) and enforces a minimum 500ms interval between requests.

JSON output

--json returns an object with these fields:

Field	Type	Description
`title`	string	Page title
`content`	string	Raw HTML extracted by Readability
`text_content`	string	Readable text (Markdown)
`byline`	string?	Author or byline
`excerpt`	string?	Short excerpt or description
`lang`	string?	Document language (e.g. `"en"`)
`url`	string?	Canonical URL

Fields marked ? are omitted when not detected.

MCP server

servo-fetch includes a built-in MCP server with five tools — fetch, batch_fetch, crawl, screenshot, and execute_js — over stdio or Streamable HTTP.

{
  "mcpServers": {
    "servo-fetch": {
      "command": "servo-fetch",
      "args": ["mcp"]
    }
  }
}

For Streamable HTTP transport:

servo-fetch mcp --port 8080

Tools

fetch — extract readable content from a URL

Parameter	Type	Description
`url`	string	URL to fetch (http/https only)
`format`	string?	`markdown` (default), `json`, `html`, `text`, or `accessibility_tree`
`max_length`	number?	Max characters to return (default 5000)
`start_index`	number?	Character offset for pagination (default 0)
`timeout`	number?	Page load timeout in seconds (default 30)
`settle_ms`	number?	Extra wait in ms after load event for SPAs (default 0, max 10000)
`selector`	string?	CSS selector to extract a specific section

batch_fetch — fetch multiple URLs in parallel

Parameter	Type	Description
`urls`	string[]	URLs to fetch (http/https only, max 20)
`format`	string?	`markdown` (default) or `json`
`max_length`	number?	Max characters per URL result (default 5000)
`timeout`	number?	Page load timeout in seconds per URL (default 30)
`settle_ms`	number?	Extra wait in ms after load event (default 0, max 10000)
`selector`	string?	CSS selector to extract a specific section

crawl — crawl a website by following links

Parameter	Type	Description
`url`	string	Starting URL (http/https only)
`limit`	number?	Maximum pages to crawl (default 20, max 500)
`max_depth`	number?	Maximum link depth from seed (default 3, max 10)
`format`	string?	`markdown` (default) or `json`
`include_glob`	string[]?	URL path patterns to include
`exclude_glob`	string[]?	URL path patterns to exclude
`max_length`	number?	Max characters per page result (default 5000)
`timeout`	number?	Page load timeout in seconds per page (default 30)
`settle_ms`	number?	Extra wait in ms after load event (default 0, max 10000)
`selector`	string?	CSS selector to extract a specific section per page

Follows same-site links only. Respects robots.txt. Results stream as each page completes.

screenshot — capture a PNG screenshot (no GPU required)

Parameter	Type	Description
`url`	string	URL to capture (http/https only)
`full_page`	boolean?	Capture the full scrollable page (default false)
`timeout`	number?	Page load timeout in seconds (default 30)
`settle_ms`	number?	Extra wait in ms after load event (default 0, max 10000)

execute_js — evaluate JavaScript in a loaded page

Parameter	Type	Description
`url`	string	URL to load before executing JS
`expression`	string	JavaScript expression to evaluate
`timeout`	number?	Page load timeout in seconds (default 30)
`settle_ms`	number?	Extra wait in ms after load event (default 0, max 10000)

Agent Skills

servo-fetch ships with an Agent Skills package for AI coding agents. Install with npx skills:

npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch

Security

servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.

Limitations

Servo's web compatibility is improving monthly but does not yet match Chromium. Some SPAs with complex client-side rendering may not fully render.
Best results on documentation, blogs, news sites, and server-rendered pages.
Sites behind login walls or CAPTCHAs are not supported.

Contributing

See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.

License

MIT