servo-fetch

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • License — License: MIT
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 8 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested
Purpose
This tool embeds the Servo browser engine into a standalone binary. It fetches, renders, and extracts web content into clean Markdown or screenshots, and can even execute JavaScript, all without requiring external dependencies like Google Chrome or an API key.

Security Assessment
Overall risk: Low. The tool is designed to make network requests (fetching and crawling URLs) and execute JavaScript within its embedded browser engine, which is its intended behavior. A light source code scan of 12 files found no dangerous patterns, hardcoded secrets, or hidden shell execution loops. The tool does not request dangerous system permissions. The only minor security note is that the recommended installation method uses an unauthenticated `curl | sh` script, which is common but inherently requires trusting the repository maintainer. Downloading prebuilt binaries via `cargo binstall` or verifying the script before running it is a safer alternative.

Quality Assessment
The project is highly active, with its last code push occurring today. It is properly licensed under the permissive MIT license. However, it currently has low community visibility with only 8 GitHub stars, meaning it has not yet been extensively peer-reviewed or battle-tested by a wide audience. Being written in Rust provides strong memory safety guarantees, and the README is exceptionally well-documented, featuring clear usage instructions and benchmarks.

Verdict
Safe to use, though the inherent risks of its low community visibility can be mitigated by reviewing the installation script before executing it.
SUMMARY

A self-contained browser engine that fetches, renders, and extracts web content — no Chrome, no API key, no setup.

README.md

servo-fetch

A self-contained browser engine that fetches, renders, and extracts web content. No Chrome, no API key, no setup.

CI crates.io MSRV MIT

servo-fetch embeds the Servo browser engine into a single binary. It executes JavaScript, computes CSS layout, captures screenshots with a software renderer, and extracts clean content.

servo-fetch "https://example.com"                        # Clean Markdown
servo-fetch "https://example.com" --screenshot page.png  # PNG screenshot, no GPU needed
servo-fetch "https://example.com" --js "document.title"  # Run JS in the page
servo-fetch URL1 URL2 URL3                               # Parallel batch fetch
servo-fetch crawl "https://docs.example.com" --limit 20  # Crawl a site (BFS)

Why servo-fetch

  • Zero dependencies — single binary, no Chrome, no Docker, no API key
  • Real JS execution — SpiderMonkey runs JavaScript, parallel CSS engine computes layout
  • Layout-aware extraction — strips navbars, sidebars, footers by actual rendered position, not HTML guessing
  • Parallel batch fetch — multiple URLs fetched concurrently, results stream as each completes
  • Site crawling — BFS link traversal with robots.txt, same-site scope, and rate limiting
  • Screenshots without GPU — software renderer captures PNG/full-page screenshots anywhere
  • Accessibility tree — AccessKit integration with roles, names, and bounding boxes

Performance

Parallel fetch — 4 URLs, JS executed, full CSS rendering:

Tool Peak Memory Time
servo-fetch 114 MB 1.5s
Playwright 502 MB 3.3s
Puppeteer 1065 MB 4.3s

Same rendering capabilities, 4–9× less memory, 2–3× faster. Methodology →

Install

curl -fsSL https://raw.githubusercontent.com/konippi/servo-fetch/main/install.sh | sh

Or via GitHub Releases, or with Cargo (requires Rust 1.86.0+):

cargo binstall servo-fetch   # prebuilt binary
cargo install servo-fetch    # build from source

Platform notes

Linux — runtime dependencies and headless setup

The Linux binary dynamically links against system libraries. Install them with:

# Debian/Ubuntu
sudo apt install -y libegl1 libfontconfig1 libfreetype6

# Fedora
sudo dnf install -y mesa-libEGL fontconfig freetype

# Arch
sudo pacman -S --needed mesa fontconfig freetype2

servo-fetch needs a working OpenGL ES context, so on headless servers (SSH/container) run it under a virtual display:

xvfb-run --auto-servernum servo-fetch "https://example.com"
Windows — zip layout

Windows releases ship as a .zip containing servo-fetch.exe alongside libEGL.dll and libGLESv2.dll — keep them in the same directory. Download from Releases, extract, and put the folder on your PATH.

macOS — no extra setup

No runtime dependencies. The release binary is ready to run.

Usage

Examples

# Readable Markdown (default)
servo-fetch "https://example.com"

# Structured JSON
servo-fetch "https://example.com" --json

# Multiple URLs in parallel (Markdown with separators)
servo-fetch "https://a.com" "https://b.com" "https://c.com"

# Multiple URLs as NDJSON (one compact JSON per line)
servo-fetch "https://a.com" "https://b.com" --json

# Screenshot — rendered to PNG without GPU
servo-fetch "https://example.com" --screenshot page.png

# Full-page screenshot (captures the entire scrollable page)
servo-fetch "https://example.com" --screenshot page.png --full-page

# Execute JavaScript in the page context
servo-fetch "https://example.com" --js "document.title"

# Extract a specific section by CSS selector
servo-fetch "https://example.com" --selector "article"

# Raw HTML or plain text (bypasses Readability)
servo-fetch "https://example.com" --raw html
servo-fetch "https://example.com" --raw text

# PDF text extraction (auto-detected via Content-Type)
servo-fetch "https://example.com/report.pdf"

# Crawl a site by following links (BFS, respects robots.txt)
servo-fetch crawl "https://docs.example.com" --limit 20 --max-depth 3

# Crawl with path filtering
servo-fetch crawl "https://docs.example.com" --include "/docs/**" --exclude "/docs/archive/**"

Options

Flag Description
--json Output as structured JSON (NDJSON when multiple URLs)
--screenshot <FILE> Save a PNG screenshot (single URL only)
--full-page Capture the full scrollable page (requires --screenshot)
--js <EXPR> Execute JavaScript and print the result (single URL only)
--selector <CSS> Extract a specific section by CSS selector
--raw <MODE> Output raw html or plain text (single URL only)
-t, --timeout <SECS> Page load timeout (default: 30)
--settle <MS> Extra wait after load event for SPAs (default: 0, max: 10000)
--help Show help
--version Show version

When multiple URLs are given, they are fetched in parallel. Results stream to stdout in completion order — Markdown with --- URL --- separators by default, or NDJSON with --json.

Crawl subcommand

servo-fetch crawl <URL> follows links within the same site using BFS. Output is always NDJSON (one JSON object per page).

Flag Description
--limit <N> Maximum pages to crawl (default: 50)
--max-depth <N> Maximum link depth from seed URL (default: 3)
--include <GLOB> URL path patterns to include (e.g. "/docs/**")
--exclude <GLOB> URL path patterns to exclude
--json Output content as JSON instead of Markdown per page
--selector <CSS> Extract a specific section per page
-t, --timeout <SECS> Per-page timeout (default: 30)
--settle <MS> Extra wait after load event per page

Crawl respects robots.txt (RFC 9309) and enforces a minimum 500ms interval between requests.

JSON output

--json returns an object with these fields:

Field Type Description
title string Page title
content string Raw HTML extracted by Readability
text_content string Readable text (Markdown)
byline string? Author or byline
excerpt string? Short excerpt or description
lang string? Document language (e.g. "en")
url string? Canonical URL

Fields marked ? are omitted when not detected.

MCP server

servo-fetch includes a built-in MCP server with five tools — fetch, batch_fetch, crawl, screenshot, and execute_js — over stdio or Streamable HTTP.

{
  "mcpServers": {
    "servo-fetch": {
      "command": "servo-fetch",
      "args": ["mcp"]
    }
  }
}

For Streamable HTTP transport:

servo-fetch mcp --port 8080

Tools

fetch — extract readable content from a URL
Parameter Type Description
url string URL to fetch (http/https only)
format string? markdown (default), json, html, text, or accessibility_tree
max_length number? Max characters to return (default 5000)
start_index number? Character offset for pagination (default 0)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event for SPAs (default 0, max 10000)
selector string? CSS selector to extract a specific section
batch_fetch — fetch multiple URLs in parallel
Parameter Type Description
urls string[] URLs to fetch (http/https only, max 20)
format string? markdown (default) or json
max_length number? Max characters per URL result (default 5000)
timeout number? Page load timeout in seconds per URL (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section
crawl — crawl a website by following links
Parameter Type Description
url string Starting URL (http/https only)
limit number? Maximum pages to crawl (default 20, max 500)
max_depth number? Maximum link depth from seed (default 3, max 10)
format string? markdown (default) or json
include_glob string[]? URL path patterns to include
exclude_glob string[]? URL path patterns to exclude
max_length number? Max characters per page result (default 5000)
timeout number? Page load timeout in seconds per page (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
selector string? CSS selector to extract a specific section per page

Follows same-site links only. Respects robots.txt. Results stream as each page completes.

screenshot — capture a PNG screenshot (no GPU required)
Parameter Type Description
url string URL to capture (http/https only)
full_page boolean? Capture the full scrollable page (default false)
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)
execute_js — evaluate JavaScript in a loaded page
Parameter Type Description
url string URL to load before executing JS
expression string JavaScript expression to evaluate
timeout number? Page load timeout in seconds (default 30)
settle_ms number? Extra wait in ms after load event (default 0, max 10000)

Agent Skills

servo-fetch ships with an Agent Skills package for AI coding agents. Install with npx skills:

npx skills add https://github.com/konippi/servo-fetch/tree/main/skills/servo-fetch

Security

servo-fetch blocks all private and reserved IP ranges (RFC 6890), strips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against terminal escape injection (CVE-2021-42574). See SECURITY.md for details.

Limitations

  • Servo's web compatibility is improving monthly but does not yet match Chromium. Some SPAs with complex client-side rendering may not fully render.
  • Best results on documentation, blogs, news sites, and server-rendered pages.
  • Sites behind login walls or CAPTCHAs are not supported.

Contributing

See CONTRIBUTING.md for development setup, commit conventions, and PR guidelines.

License

MIT

Yorumlar (0)

Sonuc bulunamadi