mcp-wayback-machine

mcp
Guvenlik Denetimi
Gecti
Health Gecti
  • License — License: NOASSERTION
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Community trust — 22 GitHub stars
Code Gecti
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

MCP server and CLI tool for interacting with the Internet Archive's Wayback Machine

README.md

MCP Wayback Machine Server

npm version
License: CC BY-NC-SA 4.0
GitHub Workflow Status

An MCP (Model Context Protocol) server and CLI tool for interacting with the Internet Archive's Wayback Machine. Supports full CDX search, snapshot content retrieval, screenshot listing, snapshot comparison, and optional authentication for higher SPN2 rate limits.

Installation

As an MCP server

CLI shorthand

Some agent harnesses provide a one-command install:

Claude Code (MCP):

claude mcp add wayback-machine -- npx -y mcp-wayback-machine

Claude Code (plugin marketplace):

/plugin marketplace add https://github.com/Mearman/mcp-wayback-machine.git
/plugin install mcp-wayback-machine@mcp-wayback-machine

OpenAI Codex:

codex mcp add wayback-machine -- npx -y mcp-wayback-machine

To include optional credentials:

claude mcp add wayback-machine --env WAYBACK_ACCESS_KEY=xxx --env WAYBACK_SECRET_KEY=xxx -- npx -y mcp-wayback-machine
codex mcp add wayback-machine --env WAYBACK_ACCESS_KEY=xxx --env WAYBACK_SECRET_KEY=xxx -- npx -y mcp-wayback-machine

Manual configuration

For harnesses that use config files, add the following to the appropriate section:

{
  "wayback-machine": {
    "command": "npx",
    "args": ["-y", "mcp-wayback-machine"],
    "env": {
      "WAYBACK_ACCESS_KEY": "your-access-key",
      "WAYBACK_SECRET_KEY": "your-secret-key"
    }
  }
}
Harness Config file Config key
Claude Code .mcp.json (project) / ~/.claude.json (user) mcpServers
Codex ~/.codex/config.toml [mcp_servers.wayback-machine]
Gemini CLI ~/.gemini/settings.json mcpServers
Crush .crush.json / ~/.config/crush/crush.json mcp
Cline .cline/mcp.json mcpServers
Cursor .cursor/mcp.json mcpServers
Zed ~/.config/zed/settings.json context_servers
Claude Desktop ~/Library/Application Support/Claude/claude_desktop_config.json mcpServers

The env block is optional — the server works anonymously without credentials. See Credentials for details.

As a CLI tool

npx mcp-wayback-machine save https://example.com

Or install globally:

npm install -g mcp-wayback-machine
wayback save https://example.com

Quick examples

What to ask the agent:

Archive https://example.com to the Wayback Machine
Find all archived snapshots of https://example.com from 2023
What's the earliest archived version of https://example.com?
Compare the oldest and newest snapshots of https://example.com
Check how many times https://example.com has been archived

Tools

save_url

Archive a URL to the Wayback Machine using the SPN2 API.

Parameters
Parameter Required Description
url Yes The URL to archive
captureScreenshot No Capture a screenshot as a PNG image
captureOutlinks No Also archive up to 100 outlinked pages
ifNotArchivedWithin No Skip if archived within timeframe, e.g. "30d"
jsBehaviorTimeout No Run JavaScript for N seconds before capturing (max 30)
forceGet No Use simple HTTP GET instead of browser rendering
delayWbAvailability No Delay indexing ~12 hours to reduce server load

get_archived_url

Retrieve an archived snapshot's content and metadata.

Parameters
Parameter Required Description
url Yes The URL to retrieve
timestamp No Specific timestamp (YYYYMMDDhhmmss) or "latest"
modifier No URL modifier: id_ (raw), im_ (screenshot), js_ (JS), cs_ (CSS)

search_archives

Search the CDX API for archived versions of a URL.

Parameters
Parameter Required Description
url Yes The URL pattern to search for
matchType No exact, prefix, host, or domain
from No Start date (YYYYMMDD or YYYY-MM-DD)
to No End date (YYYYMMDD or YYYY-MM-DD)
limit No Maximum results (default 10)
offset No Skip the first N results
collapse No Collapse duplicates, e.g. "timestamp:8" (per hour), "digest"
filter No Filter by field regex, e.g. ["statuscode:200", "!mimetype:image.*"]
resolveRevisits No Resolve warc/revisit entries to original metadata
showDupeCount No Show duplicate count per capture
page No Page number for pagination
pageSize No Results per page

check_archive_status

Check archival statistics for a URL — capture counts, yearly breakdowns, and first/last capture dates.

Parameters
Parameter Required Description
url Yes The URL to check

list_screenshots

List available screenshots for a URL.

Parameters
Parameter Required Description
url Yes The URL to find screenshots for
limit No Maximum results (default 10)

compare_snapshots

Compare two archived snapshots of a URL. Fetches the raw content of both and provides a visual diff URL.

Parameters
Parameter Required Description
url Yes The URL to compare snapshots for
timestampA No First timestamp. Defaults to oldest available.
timestampB No Second timestamp. Defaults to newest available.

clear_cache

Clear all cached API responses. Use when fresh data is needed or after saving a new URL.

Credentials

The server works anonymously by default. Set Internet Archive S3 credentials for higher rate limits on save operations:

export WAYBACK_ACCESS_KEY="your-access-key"
export WAYBACK_SECRET_KEY="your-secret-key"

To obtain credentials, log in to archive.org and visit your S3 API keys page.

CLI Usage

wayback save https://example.com
wayback get https://example.com
wayback get https://example.com --timestamp 20231225120000
wayback search https://example.com --from 2023-01-01 --to 2023-12-31 --limit 20
wayback status https://example.com
wayback screenshots https://example.com
wayback compare https://example.com
wayback compare https://example.com --timestamp-a 20230101000000 --timestamp-b 20240101000000

Technical Details

  • Transport: stdio (MCP client integration)
  • Caching: in-memory and disk-based with per-endpoint TTLs:
    • Snapshot content: 24 hours (immutable once captured)
    • Availability, CDX, sparkline: 1 hour (grows but never mutates)
    • Save operations: 30 minutes (idempotent per URL)
    • Save status polling: 30 seconds (changes during active jobs)
  • Rate limiting: 15 requests per minute, with automatic Retry-After handling for 429 responses
  • Validation: Zod schemas for all inputs and API responses
  • Node.js 22+ required

Development

Requires pnpm and Node.js 22+.

pnpm install
pnpm validate     # typecheck + lint + test + build

Resources

Related

  • internet-archive-skills — Official Claude Code skill for uploading to, downloading from, and searching the Internet Archive via the ia Python CLI. Complements this project (general IA operations) vs. this server (Wayback Machine MCP protocol).

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

CC BY-NC-SA 4.0

Yorumlar (0)

Sonuc bulunamadi