Spider

Website |
Guides |
API Docs |
Examples |
Discord

A high-performance web crawler and scraper for Rust. 200-1000x faster than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.

Crawl 100k+ pages in minutes on a single machine. See benchmarks.
HTTP, Chrome CDP, WebDriver, and AI automation in one dependency.
Production-ready with caching, proxy rotation, anti-bot bypass, and distributed crawling. Feature-gated so you only compile what you use.

Quick Start

Command Line

cargo install spider_cli
spider --url https://example.com

Rust

[dependencies]
spider = "2"

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    website.crawl().await;
    println!("Pages found: {}", website.get_links().len());
}

Streaming

Process each page the moment it's crawled, not after:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Headless Chrome

Add one feature flag to render JavaScript-heavy pages:

[dependencies]
spider = { version = "2", features = ["chrome"] }

use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://example.com")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

Also supports WebDriver (Selenium Grid, remote browsers) and AI-driven automation. See examples for more.

Benchmarks

Crawling 185 pages (source, 10 samples averaged):

Apple M1 Max (10-core, 64 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	73 ms	baseline
node-crawler	JavaScript	15 s	205x slower
colly	Go	32 s	438x slower
wget	C	70 s	959x slower

Linux (2-core, 7 GB RAM):

Crawler	Language	Time	vs Spider
spider	Rust	50 ms	baseline
node-crawler	JavaScript	3.4 s	68x slower
colly	Go	30 s	600x slower
wget	C	60 s	1200x slower

The gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime (tokio), lock-free data structures, and optional io_uring on Linux. Full details

Why Spider?

Most crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.

Supports HTTP, Chrome, and WebDriver. Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.

Built for production. Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through Spider Cloud.

AI automation included. spider_agent adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.

Features

Crawling

Concurrent and streaming crawls with backpressure
Decentralized crawling for horizontal scaling
Caching: memory, disk (SQLite), or hybrid Chrome cache
Proxy support with rotation
Cron job scheduling
Depth budgeting, blacklisting, whitelisting
Smart mode that auto-detects JS-rendered content and upgrades to Chrome

Browser Automation

Chrome DevTools Protocol: headless or headed, stealth mode, screenshots, request interception
WebDriver: Selenium Grid, remote browsers, cross-browser testing
AI-powered challenge solving (deterministic + Chrome built-in AI)
Anti-bot fingerprinting, ad blocking, firewall

Data Processing

HTML transformations (Markdown, text, structured extraction)
CSS/XPath scraping with spider_utils
OpenAI and Gemini integration for content analysis

AI Agent

spider_agent: concurrent-safe multimodal web automation agent
Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)
Web research with search providers (Serper, Brave, Bing, Tavily)
110 built-in automation skills for web challenges

Spider Cloud

For managed proxy rotation, anti-bot bypass, and CAPTCHA handling, Spider Cloud plugs in with one line:

let mut website = Website::new("https://protected-site.com")
    .with_spider_cloud("your-api-key")  // enable with features = ["spider_cloud"]
    .build()
    .unwrap();

Mode	Strategy	Best For
Proxy (default)	All traffic through Spider Cloud proxy	General crawling with IP rotation
Smart (recommended)	Proxy + auto-fallback on bot detection	Production (speed + reliability)
Fallback	Direct first, API on failure	Cost-efficient, most sites work without help
Unblocker	All requests through unblocker	Aggressive bot protection

Free credits on signup. Get started at spider.cloud

Spider Browser Cloud

Connect to a remote Rust-based browser via CDP over WebSocket for automation, scraping, and AI extraction:

use spider::configuration::SpiderBrowserConfig;

// Simple — just an API key
let mut website = Website::new("https://example.com")
    .with_spider_browser("your-api-key")  // features = ["spider_cloud", "chrome"]
    .build()
    .unwrap();

// Full config — stealth, country targeting, custom options
let browser_cfg = SpiderBrowserConfig::new("your-api-key")
    .with_stealth(true)
    .with_country("us");

let mut website = Website::new("https://example.com")
    .with_spider_browser_config(browser_cfg)
    .build()
    .unwrap();

WebSocket endpoint: wss://browser.spider.cloud/v1/browser — supports CDP and WebDriver BiDi protocols.

Parallel Backends (LightPanda / Servo)

Race alternative browser engines alongside the primary crawl. The best HTML response wins — higher reliability and coverage for JS-heavy pages.

use spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig};

let mut website = Website::new("https://example.com");

// Race a remote LightPanda instance alongside the primary crawl.
website.configuration.parallel_backends = Some(ParallelBackendsConfig {
    backends: vec![BackendEndpoint {
        engine: BackendEngine::LightPanda,
        endpoint: Some("ws://127.0.0.1:9222".to_string()),
        binary_path: None,
        protocol: None,
        proxy: None, // inherits from website proxies config
    }],
    grace_period_ms: 500,       // wait up to 500ms for a better result
    fast_accept_threshold: 80,  // accept immediately if quality >= 80
    ..Default::default()
});

website.crawl().await;

Features: lightpanda (LightPanda via CDP), servo (Servo via WebDriver), parallel_backends_full (both).

Lock-free, zero overhead when disabled, automatic backend health tracking with auto-disable after consecutive failures.

Get Spider

Package	Language	Install
spider	Rust	`cargo add spider`
spider_cli	CLI	`cargo install spider_cli`
spider-nodejs	Node.js	`npm i @spider-rs/spider-rs`
spider-py	Python	`pip install spider_rs`
spider_agent	Rust	`cargo add spider --features agent`
spider_mcp	MCP	`cargo install spider_mcp`

MCP Server

Use Spider as tools in Claude Code, Claude Desktop, or any MCP client:

cargo install spider_mcp

{ "mcpServers": { "spider": { "command": "spider-mcp" } } }

Then ask: "Scrape https://example.com as markdown" or "Crawl https://example.com up to 5 pages"

Cloud and Remote

Package	Description
Spider Cloud	Managed crawling infrastructure, no setup needed
spider-clients	SDKs for Spider Cloud in multiple languages
spider-browser	Remote access to Spider's Rust browser

Resources

64 examples covering crawling, Chrome, WebDriver, AI, caching, and more
API documentation
Benchmarks
Changelog

Contributing

Contributions welcome. See CONTRIBUTING.md for setup and guidelines.

Spider has been actively developed for the past 4 years. Join the Discord for questions and discussion.

License

MIT