librecrawl-mcp

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 2 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Self-hosted SEO crawler for Claude — Screaming Frog-level audits, unlimited pages, 1-click install. 19 MCP tools.

README.md

LibreCrawl MCP

Self-hosted SEO crawler for Claude — Screaming Frog-level audits, zero per-crawl cost.

Wraps LibreCrawl as a Claude MCP server. Give Claude the ability to fully audit any website — broken links, canonical issues, image alt text, orphan pages, Core Web Vitals, Schema.org, GSC errors — all running on your own server.

License: MIT
MCP
LibreCrawl


vs Screaming Frog

Check Screaming Frog Free Screaming Frog Paid LibreCrawl MCP
Pages 500 limit Unlimited Unlimited
Price Free (capped) £149/yr Free + self-host
Broken links (4xx/5xx) + source page
Redirect chains
Missing/duplicate title tags
Missing/duplicate meta descriptions
Title / meta too long or too short
Missing H1
H1 ↔ Title keyword mismatch
Canonical analysis (missing, self, non-self, broken)
Thin content
Response time / slow pages
URL quality (uppercase, length, params)
Page depth warnings
Noindex page detection
Image alt text
Broken images
Orphan pages (no inbound links)
Open Graph tags
Viewport meta (mobile)
Hreflang detection
robots.txt (rules, crawl-delay, sitemap)
sitemap.xml
HTTPS redirect check
www/non-www canonicalization
Core Web Vitals (PSI)
Schema.org / JSON-LD
GSC indexing errors ✅ (via MCP)
Analytics tag detection (GA4, GTM, Pixel)
AI-generated fix checklist
Natural language audit report
Fully automated (one command)

What LibreCrawl MCP gives you

LibreCrawl is a powerful self-hosted SEO crawler. On its own you get a web UI and raw crawl data. This MCP layer turns it into a fully automated Claude-native audit engine — one sentence to Claude gets you a complete site audit, saved as a Markdown report, ready to act on.

19 tools, zero context-switching

Claude gains 19 specialised SEO tools. Ask in plain English — Claude picks the right tool, runs it, interprets the output, and gives you a fix plan. No dashboards to navigate, no exports to download, no manual cross-referencing.

  • librecrawl_audit() — one call: crawls the site, runs site-level checks, generates a structured Markdown report with a prioritised fix checklist. Done.
  • Persistent authenticated session across all tool calls — no re-login between steps
  • Every crawl saved to SQLite with full history via librecrawl_list_crawls()

Report engine — 35+ checks, 17 sections

LibreCrawl's raw JSON export gets transformed into a structured audit report covering everything Screaming Frog covers, plus things it doesn't:

Section What it catches
Summary scorecard 30+ metrics with pass/fail at a glance
Critical issues Broken pages + which pages link to them, duplicate titles, bad canonicals
On-page warnings Meta length, missing H1, thin content, slow response time
H1 ↔ Title alignment Keyword mismatch check — flags pages where headings contradict title intent
Canonical analysis Missing, self-referencing, non-self, canonical → broken URL
Images Missing alt text (per-page count + total), broken image references
Noindex detection Every page with robots: noindex — catch accidental SEO blackouts
Orphan pages Pages with zero inbound links — invisible to Googlebot
Redirect chains Multi-hop chains (A→B→C), not just 3xx presence
Open Graph + Viewport Social sharing tags, mobile viewport — affects CTR and ranking
Heading structure Pages with content but no H2s, broken H1→H2 hierarchy
Analytics coverage Pages missing GA4 or GTM — tracking blind spots
URL quality Uppercase slugs, long URLs, parameter-heavy URLs, depth >4
Site-level checks robots.txt rules, sitemap.xml, HTTPS redirect, www canonicalisation
Hreflang Language variant detection for multilingual sites
Issue type breakdown LibreCrawl's 1,600+ built-in issue detectors, counted by type
Fix checklist Auto-prioritised P1→Pn task list — paste straight into your tracker

Beyond crawl data — 6 specialist tools

Tool Value
librecrawl_site_check Instant technical health — robots.txt, sitemap, HTTPS, www. No crawl needed.
librecrawl_pagespeed Google PageSpeed Insights: Core Web Vitals, LCP/CLS/INP, lab + field data
librecrawl_pagespeed_audit Batch CWV across top pages — ranked worst first, throttled to stay in free quota
librecrawl_schema_check Extract all JSON-LD from a page, map to rich results they unlock (FAQ, Product, Breadcrumb…)
librecrawl_schema_audit Schema coverage across 50 pages at once
librecrawl_internal_links_analysis Internal authority map: top-linked pages, dead ends, orphans, anchor text patterns

GSC integration

Connect any Google Search Console MCP (we recommend mcp-gsc) and Claude will pull your real indexing errors — 404s in sitemap, server errors, soft 404s, crawl blocks — and append them to the audit report with fix hints. One conversation, complete picture.


Quick Start — Your First Audit

This guide takes you from zero to a full SEO audit report in under 10 minutes.

Step 1 — Install (1 command)

curl -fsSL https://raw.githubusercontent.com/adityaarsharma/librecrawl-mcp/main/install.sh | bash

What happens:

  • LibreCrawl Docker image is pulled and built (~5–8 min, first run only)
  • Python MCP server is installed and registered with PM2
  • Both services start automatically and survive reboots

You'll be asked for a Google PageSpeed Insights API key (optional but recommended — free at console.cloud.google.com, 25k req/day).

Verify everything is running:

pm2 status librecrawl-mcp          # MCP server — should show "online"
docker ps | grep librecrawl        # LibreCrawl — should show "healthy"

Step 2 — Add to Claude

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "librecrawl": {
      "type": "http",
      "url": "http://127.0.0.1:5081/mcp"
    }
  }
}

Claude Code — edit ~/.claude/settings.json:

{
  "mcpServers": {
    "librecrawl": {
      "type": "http",
      "url": "http://127.0.0.1:5081/mcp"
    }
  }
}

Restart Claude Desktop after saving. In Claude Code, the MCP is live immediately.


Step 3 — Run your first audit

Open Claude and type:

Audit https://example.com and give me a full SEO report

Claude will:

  1. Crawl the entire site (runs in background, progress visible in pm2 logs librecrawl-mcp)
  2. Run site-level checks (robots.txt, sitemap, HTTPS, www redirect)
  3. Analyse all 35+ SEO checks across every crawled page
  4. Save a Markdown report to ~/librecrawl-reports/example.com-YYYYMMDD-HHMMSS.md
  5. Return a summary of the top issues with a prioritised fix checklist

For large sites (1,000+ pages), the crawl can run up to 2 hours. To cap it:

Audit https://bigsite.com — limit to 500 pages

Step 4 — Read your report

The report is saved at ~/librecrawl-reports/. Each report has:

Section What it shows
Scorecard Pass/fail for every check at a glance
Critical issues Broken pages (404/5xx) + which pages link to them
On-page warnings Missing titles, duplicate metas, thin content, slow pages
Canonical analysis Missing, self, non-self, broken canonical URLs
Orphan pages Pages with zero inbound links — Googlebot may never find them
Redirect chains Multi-hop chains (A→B→C) eating crawl budget
Images Pages with missing alt text, broken image references
Analytics coverage Pages not tracked by GA4 or GTM
Fix checklist P1→Pn priority tasks — paste straight into Jira/ClickUp

Step 5 — Add GSC errors (optional)

Install a Google Search Console MCP (mcp-gsc is recommended), then ask:

Audit https://example.com and include GSC indexing errors

Claude will merge your real GSC coverage errors — 404s in sitemap, soft 404s, server errors, crawl blocks — into the audit report automatically.


Other things to ask

Check Core Web Vitals on the top 10 pages of example.com
Does example.com have schema markup? What rich results is it missing?
Run an internal links analysis — which pages are orphans?
Quick site health check for example.com — no crawl needed

Remote server setup (optional)

For local installs see the Quick Start above. To expose your LibreCrawl MCP over Nginx (e.g. on a VPS so your whole team can use it):

Claude config (via mcp-remote):

{
  "mcpServers": {
    "librecrawl": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://your-domain.com/librecrawl/mcp"]
    }
  }
}

Nginx location block for remote:

location /librecrawl/ {
    proxy_pass              http://127.0.0.1:5081/;
    proxy_http_version      1.1;
    proxy_set_header        Host $host;
    proxy_read_timeout      600s;
    proxy_buffering         off;
    proxy_cache             off;
    chunked_transfer_encoding on;
}

Add Google Search Console (optional but recommended)

LibreCrawl MCP can merge GSC coverage errors into your audit report via librecrawl_append_gsc_section. To use it, connect a GSC MCP server to Claude first.

Recommended: mcp-gsc (AminForou)

The best community GSC MCP — 876+ stars, maintained, supports OAuth and service account auth.

Install:

pip install mcp-search-console
# or with uvx (no pip needed):
# uvx mcp-search-console

Option A — OAuth (interactive, easiest):

  1. Create a project at console.cloud.google.com
  2. Enable Google Search Console API
  3. Create OAuth 2.0 Client ID → Desktop app → download credentials.json
  4. Add to Claude config:
{
  "mcpServers": {
    "gsc": {
      "command": "uvx",
      "args": ["mcp-search-console"],
      "env": {
        "GOOGLE_CREDENTIALS_FILE": "/path/to/credentials.json"
      }
    }
  }
}

First run opens a browser for Google auth. Token is cached.

Option B — Service Account (automation-friendly):

  1. Create a service account at console.cloud.google.com → IAM → Service Accounts
  2. Download the JSON key file
  3. Add the service account email as a user in your GSC property (Settings → Users and permissions)
  4. Add to Claude config:
{
  "mcpServers": {
    "gsc": {
      "command": "uvx",
      "args": ["mcp-search-console"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/service-account.json"
      }
    }
  }
}

Using GSC with audit reports

Once connected, Claude can pull GSC errors and merge them into any audit:

"Audit uichemy.com, include GSC indexing errors"

Claude will:

  1. Run librecrawl_audit("https://uichemy.com") → gets report_path
  2. Pull GSC coverage errors via the GSC MCP
  3. Call librecrawl_append_gsc_section(report_path, gsc_data) → adds a GSC section to the report

The GSC section includes: indexing errors with fix hints, crawl errors, manual actions, and a prioritised fix checklist.


Tools (19 total)

Tool What it does
librecrawl_audit One-call full audit — crawl + site checks + 30+ checks + report
librecrawl_site_check Instant: robots.txt, sitemap, HTTPS, www — no crawl needed
librecrawl_generate_report Re-generate report from a past crawl
librecrawl_start_crawl Start async crawl, returns crawl_id
librecrawl_get_status Poll crawl progress
librecrawl_pause_crawl Pause a running crawl
librecrawl_resume_crawl Resume a paused crawl
librecrawl_stop_crawl Stop running crawl
librecrawl_export_results Raw JSON export
librecrawl_list_crawls List all saved crawls
librecrawl_get_settings Show current crawler settings
librecrawl_filter_issues Exclude false-positive patterns from results
librecrawl_visualization_data Site link graph (nodes + edges)
librecrawl_internal_links_analysis Internal authority map — top linked pages, orphans, dead ends, anchor text
librecrawl_pagespeed Core Web Vitals for one URL (Google PSI API)
librecrawl_pagespeed_audit Batch CWV for up to 25 URLs, ranked worst-first
librecrawl_schema_check Schema.org / JSON-LD for one URL — rich result mapping
librecrawl_schema_audit Schema coverage across multiple URLs
librecrawl_append_gsc_section Merge GSC indexing errors into any audit report

Architecture

Claude
  │  MCP (streamable-http)
  ▼
Python MCP server (port 5081)
  │  REST API
  ▼
LibreCrawl Flask app (port 5080, Docker)
  │  Headless crawl (Playwright + Chromium)
  ▼
Target website

Stack: LibreCrawl · FastMCP · httpx · uvicorn · PM2 · Docker


Manage services

# Status
pm2 status librecrawl-mcp
docker ps | grep librecrawl

# Logs
pm2 logs librecrawl-mcp
docker logs librecrawl --tail 50

# Restart
pm2 restart librecrawl-mcp
docker restart librecrawl

# Stop
pm2 stop librecrawl-mcp
docker stop librecrawl

Configuration

Env var Default Description
INSTALL_DIR ~/librecrawl-mcp Where to install
LIBRECRAWL_PORT 5080 LibreCrawl internal port
MCP_PORT 5081 MCP server port
PAGESPEED_API_KEY Google PSI API key (free at console.cloud.google.com)
REPORTS_DIR ~/librecrawl-reports Where Markdown reports are saved

Related


License

MIT — use freely, attribution appreciated.

Built by Aditya Sharma

Yorumlar (0)

Sonuc bulunamadi