mcp-wayback-machine
Health Pass
- License — License: NOASSERTION
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Community trust — 22 GitHub stars
Code Pass
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
- Permissions — No dangerous permissions requested
No AI report is available for this listing yet.
MCP server and CLI tool for interacting with the Internet Archive's Wayback Machine
MCP Wayback Machine Server
An MCP (Model Context Protocol) server and CLI tool for interacting with the Internet Archive's Wayback Machine. Supports full CDX search, snapshot content retrieval, screenshot listing, snapshot comparison, and optional authentication for higher SPN2 rate limits.
Installation
As an MCP server
CLI shorthand
Some agent harnesses provide a one-command install:
Claude Code (MCP):
claude mcp add wayback-machine -- npx -y mcp-wayback-machine
Claude Code (plugin marketplace):
/plugin marketplace add https://github.com/Mearman/mcp-wayback-machine.git
/plugin install mcp-wayback-machine@mcp-wayback-machine
OpenAI Codex:
codex mcp add wayback-machine -- npx -y mcp-wayback-machine
To include optional credentials:
claude mcp add wayback-machine --env WAYBACK_ACCESS_KEY=xxx --env WAYBACK_SECRET_KEY=xxx -- npx -y mcp-wayback-machine
codex mcp add wayback-machine --env WAYBACK_ACCESS_KEY=xxx --env WAYBACK_SECRET_KEY=xxx -- npx -y mcp-wayback-machine
Manual configuration
For harnesses that use config files, add the following to the appropriate section:
{
"wayback-machine": {
"command": "npx",
"args": ["-y", "mcp-wayback-machine"],
"env": {
"WAYBACK_ACCESS_KEY": "your-access-key",
"WAYBACK_SECRET_KEY": "your-secret-key"
}
}
}
| Harness | Config file | Config key |
|---|---|---|
| Claude Code | .mcp.json (project) / ~/.claude.json (user) |
mcpServers |
| Codex | ~/.codex/config.toml |
[mcp_servers.wayback-machine] |
| Gemini CLI | ~/.gemini/settings.json |
mcpServers |
| Crush | .crush.json / ~/.config/crush/crush.json |
mcp |
| Cline | .cline/mcp.json |
mcpServers |
| Cursor | .cursor/mcp.json |
mcpServers |
| Zed | ~/.config/zed/settings.json |
context_servers |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
mcpServers |
The env block is optional — the server works anonymously without credentials. See Credentials for details.
As a CLI tool
npx mcp-wayback-machine save https://example.com
Or install globally:
npm install -g mcp-wayback-machine
wayback save https://example.com
Quick examples
What to ask the agent:
Archive https://example.com to the Wayback Machine
Find all archived snapshots of https://example.com from 2023
What's the earliest archived version of https://example.com?
Compare the oldest and newest snapshots of https://example.com
Check how many times https://example.com has been archived
Tools
save_url
Archive a URL to the Wayback Machine using the SPN2 API.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to archive |
captureScreenshot |
No | Capture a screenshot as a PNG image |
captureOutlinks |
No | Also archive up to 100 outlinked pages |
ifNotArchivedWithin |
No | Skip if archived within timeframe, e.g. "30d" |
jsBehaviorTimeout |
No | Run JavaScript for N seconds before capturing (max 30) |
forceGet |
No | Use simple HTTP GET instead of browser rendering |
delayWbAvailability |
No | Delay indexing ~12 hours to reduce server load |
get_archived_url
Retrieve an archived snapshot's content and metadata.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to retrieve |
timestamp |
No | Specific timestamp (YYYYMMDDhhmmss) or "latest" |
modifier |
No | URL modifier: id_ (raw), im_ (screenshot), js_ (JS), cs_ (CSS) |
search_archives
Search the CDX API for archived versions of a URL.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL pattern to search for |
matchType |
No | exact, prefix, host, or domain |
from |
No | Start date (YYYYMMDD or YYYY-MM-DD) |
to |
No | End date (YYYYMMDD or YYYY-MM-DD) |
limit |
No | Maximum results (default 10) |
offset |
No | Skip the first N results |
collapse |
No | Collapse duplicates, e.g. "timestamp:8" (per hour), "digest" |
filter |
No | Filter by field regex, e.g. ["statuscode:200", "!mimetype:image.*"] |
resolveRevisits |
No | Resolve warc/revisit entries to original metadata |
showDupeCount |
No | Show duplicate count per capture |
page |
No | Page number for pagination |
pageSize |
No | Results per page |
check_archive_status
Check archival statistics for a URL — capture counts, yearly breakdowns, and first/last capture dates.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to check |
list_screenshots
List available screenshots for a URL.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to find screenshots for |
limit |
No | Maximum results (default 10) |
compare_snapshots
Compare two archived snapshots of a URL. Fetches the raw content of both and provides a visual diff URL.
Parameters| Parameter | Required | Description |
|---|---|---|
url |
Yes | The URL to compare snapshots for |
timestampA |
No | First timestamp. Defaults to oldest available. |
timestampB |
No | Second timestamp. Defaults to newest available. |
clear_cache
Clear all cached API responses. Use when fresh data is needed or after saving a new URL.
Credentials
The server works anonymously by default. Set Internet Archive S3 credentials for higher rate limits on save operations:
export WAYBACK_ACCESS_KEY="your-access-key"
export WAYBACK_SECRET_KEY="your-secret-key"
To obtain credentials, log in to archive.org and visit your S3 API keys page.
CLI Usage
wayback save https://example.com
wayback get https://example.com
wayback get https://example.com --timestamp 20231225120000
wayback search https://example.com --from 2023-01-01 --to 2023-12-31 --limit 20
wayback status https://example.com
wayback screenshots https://example.com
wayback compare https://example.com
wayback compare https://example.com --timestamp-a 20230101000000 --timestamp-b 20240101000000
Technical Details
- Transport: stdio (MCP client integration)
- Caching: in-memory and disk-based with per-endpoint TTLs:
- Snapshot content: 24 hours (immutable once captured)
- Availability, CDX, sparkline: 1 hour (grows but never mutates)
- Save operations: 30 minutes (idempotent per URL)
- Save status polling: 30 seconds (changes during active jobs)
- Rate limiting: 15 requests per minute, with automatic Retry-After handling for 429 responses
- Validation: Zod schemas for all inputs and API responses
- Node.js 22+ required
Development
Requires pnpm and Node.js 22+.
pnpm install
pnpm validate # typecheck + lint + test + build
Resources
- Internet Archive Developer Portal
- CDX Server Documentation
- Save Page Now 2 (SPN2) API
- Bots, LLMs, and Automated Access
Related
- internet-archive-skills — Official Claude Code skill for uploading to, downloading from, and searching the Internet Archive via the
iaPython CLI. Complements this project (general IA operations) vs. this server (Wayback Machine MCP protocol).
License
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Reviews (0)
Sign in to leave a review.
Leave a reviewNo results found
