Web Research Assistant MCP Server

Comprehensive Model Context Protocol (MCP) server that provides web research and discovery capabilities.
Includes 13 tools, 4 resources, and 5 prompts for searching, crawling, and analyzing web content, powered by your local Docker SearXNG
instance, Exa AI neural search, the crawl4ai project, and Pixabay API:

web_search — federated search across multiple engines via SearXNG
search_examples — find code examples, tutorials, and articles (defaults to recent content)
search_images — find high-quality stock photos, illustrations, and vectors via Pixabay
crawl_url — full page content extraction with advanced crawling
package_info — detailed package metadata from npm, PyPI, crates.io, Go
package_search — discover packages by keywords and functionality
github_repo — repository health metrics and development activity
translate_error — find solutions for error messages and stack traces from Stack Overflow (auto-detects CORS, fetch, and web errors)
api_docs — auto-discover and crawl official API documentation with examples (works for any API - no hardcoded URLs)
extract_data — extract structured data (tables, lists, fields, JSON-LD) from web pages with automatic detection
compare_tech — compare technologies side-by-side with NPM downloads, GitHub stars, and aspect analysis (React vs Vue, PostgreSQL vs MongoDB, etc.)
get_changelog — NEW! Get release notes and changelogs with breaking change detection (upgrade safely from version X to Y)
check_service_status — NEW! Instant health checks for 25+ services (Stripe, AWS, GitHub, OpenAI, etc.) - "Is it down or just me?"

All tools feature comprehensive error handling, response size limits, usage tracking, and clear documentation
for optimal AI agent integration.

MCP Resources (Direct Data Lookups)

package://{registry}/{name} - Package info from npm, PyPI, crates.io, or Go modules
github://{owner}/{repo} - Repository information and health metrics
status://{service} - Service health status for 120+ services
changelog://{registry}/{package} - Package release notes and changelogs

MCP Prompts (Reusable Workflows)

research_package - Comprehensive package evaluation
debug_error - Structured error debugging with solutions
compare_technologies - Side-by-side technology comparison
evaluate_repository - GitHub repository health assessment
check_service_health - Multi-service status monitoring

Quick Start

Option 1: Full Docker Setup (Recommended)

Everything runs in Docker - no Python installation needed:

./docker-start.sh

This starts both SearXNG and the MCP server in containers. See DOCKER_SETUP.md for details.

Option 2: Python + Docker SearXNG

Set up SearXNG (5 minutes):
```
# Using Docker (recommended)
docker run -d -p 2288:8080 searxng/searxng:latest
```
Then configure search engines - see SEARXNG_SETUP.md for optimized settings.

Install the MCP server:

uvx web-research-assistant  # or: pip install web-research-assistant

Configure Claude Desktop - add to claude_desktop_config.json:

{
  "mcpServers": {
    "web-research-assistant": {
      "command": "uvx",
      "args": ["web-research-assistant"]
    }
  }
}

Restart Claude Desktop and start researching!

⚠️ For best results: Configure SearXNG with GitHub, Stack Overflow, and other code-focused search engines. See SEARXNG_SETUP.md for the recommended configuration.

Prerequisites

Required

Python 3.10+
A running SearXNG instance on http://localhost:2288
- 📖 See SEARXNG_SETUP.md for complete Docker setup guide
- ⚠️ IMPORTANT: For best results, enable these search engines in SearXNG:
  - GitHub, Stack Overflow, GitLab (for code search - critical!)
  - DuckDuckGo, Brave (for web search)
  - MDN, Wikipedia (for documentation)
  - Reddit, HackerNews (for tutorials and discussions)
  - See SEARXNG_SETUP.md for the full optimized configuration

Optional

Exa API key for neural search - Get API key (recommended for better search results)
Pixabay API key for image search - Get free key
Playwright browsers for advanced crawling (auto-installed with crawl4ai-setup)

Developer Setup (if running from source)

uv tool install uv  # if you do not already have uv
uv sync              # creates the virtual environment
uv run crawl4ai-setup  # installs Chromium for crawl4ai

You can also use pip install -r requirements.txt if you prefer pip over uv.

Installation

Option 1: Using uvx (Recommended - No installation needed!)

uvx web-research-assistant

This runs the server directly from PyPI without installing it globally.

Option 2: Install with pip

pip install web-research-assistant
web-research-assistant

Option 3: Install with uv

uv tool install web-research-assistant
web-research-assistant

By default the server communicates over stdio, which makes it easy to wire into
Claude Desktop or any other MCP host.

MCP Client Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

Option 1: Using uvx (Recommended - No installation needed!)

{
  "mcpServers": {
    "web-research-assistant": {
      "command": "uvx",
      "args": ["web-research-assistant"]
    }
  }
}

Option 2: Using installed package

{
  "mcpServers": {
    "web-research-assistant": {
      "command": "web-research-assistant"
    }
  }
}

OpenCode

Add to ~/.config/opencode/opencode.json:

Using uvx (Recommended)

{
  "mcp": {
    "web-research-assistant": {
      "type": "local",
      "command": ["uvx", "web-research-assistant"],
      "enabled": true
    }
  }
}

Using installed package

{
  "mcp": {
    "web-research-assistant": {
      "type": "local",
      "command": ["web-research-assistant"],
      "enabled": true
    }
  }
}

Development (Running from source)

For Claude Desktop:

{
  "mcpServers": {
    "web-research-assistant": {
      "command": "uv",
      "args": [
        "--directory",
        "/ABSOLUTE/PATH/TO/web-research-assistant",
        "run",
        "web-research-assistant"
      ]
    }
  }
}

For OpenCode:

{
  "mcp": {
    "web-research-assistant": {
      "type": "local",
      "command": [
        "uv",
        "--directory",
        "/ABSOLUTE/PATH/TO/web-research-assistant",
        "run",
        "web-research-assistant"
      ],
      "enabled": true
    }
  }
}

Restart your MCP client afterwards. The MCP tools will be available immediately.

Tool behavior

Tool	When to use	Arguments
`web_search`	Use first to gather recent information and URLs from SearXNG. Returns 1–10 ranked snippets with clickable URLs.	`query` (required), `reasoning` (required), optional `category` (defaults to `general`), and `max_results` (defaults to 5).
`search_examples`	Find code examples, tutorials, and technical articles. Optimized for technical content with optional time filtering. Perfect for learning APIs or finding usage patterns.	`query` (required, e.g., "Python async examples"), `reasoning` (required), `content_type` (code/articles/both, defaults to both), `time_range` (day/week/month/year/all, defaults to all), optional `max_results` (defaults to 5).
`search_images`	Find high-quality royalty-free stock images from Pixabay. Returns photos, illustrations, or vectors. Requires `PIXABAY_API_KEY` environment variable.	`query` (required, e.g., "mountain landscape"), `reasoning` (required), `image_type` (all/photo/illustration/vector, defaults to all), `orientation` (all/horizontal/vertical, defaults to all), optional `max_results` (defaults to 10).
`crawl_url`	Call immediately after search when you need the actual article body for quoting, summarizing, or extracting data.	`url` (required), `reasoning` (required), optional `max_chars` (defaults to 8000 characters).
`package_info`	Look up specific npm, PyPI, crates.io, or Go package metadata including version, downloads, license, and dependencies. Use when you know the package name.	`name` (required package name), `reasoning` (required), `registry` (npm/pypi/crates/go, defaults to npm).
`package_search`	Search for packages by keywords or functionality (e.g., "web framework", "json parser"). Use when you need to find packages that solve a specific problem.	`query` (required search terms), `reasoning` (required), `registry` (npm/pypi/crates/go, defaults to npm), optional `max_results` (defaults to 5).
`github_repo`	Get GitHub repository health metrics including stars, forks, issues, recent commits, and project details. Use when evaluating open source projects.	`repo` (required, owner/repo or full URL), `reasoning` (required), optional `include_commits` (defaults to true).
`translate_error`	Find Stack Overflow solutions for error messages and stack traces. Auto-detects language/framework, extracts key terms (CORS, map, undefined, etc.), filters irrelevant results, and prioritizes Stack Overflow solutions. Handles web-specific errors (CORS, fetch).	`error_message` (required stack trace or error text), `reasoning` (required), optional `language` (auto-detected), optional `framework` (auto-detected), optional `max_results` (defaults to 5).
`api_docs`	Auto-discover and crawl official API documentation. Dynamically finds docs URLs using patterns (docs.{api}.com, {api}.com/docs, etc.), searches for specific topics, crawls pages, and extracts overview, parameters, examples, and related links. Works for ANY API - no hardcoded URLs. Perfect for API integration and learning.	`api_name` (required, e.g., "stripe", "react"), `topic` (required, e.g., "create customer", "hooks"), `reasoning` (required), optional `max_results` (defaults to 2 pages).
`extract_data`	Extract structured data from HTML pages. Supports tables, lists, fields (via CSS selectors), JSON-LD, and auto-detection. Returns clean JSON output. More efficient than parsing full page text. Perfect for scraping pricing tables, package specs, release notes, or any structured content.	`url` (required), `reasoning` (required), `extract_type` (table/list/fields/json-ld/auto, defaults to auto), optional `selectors` (CSS selectors for fields mode), optional `max_items` (defaults to 100).
`compare_tech`	Compare 2-5 technologies side-by-side. Auto-detects category (framework/database/language) and gathers data from NPM, GitHub, and web search. Returns structured comparison with popularity metrics (downloads, stars), performance insights, and best-use summaries. Fast parallel processing (3-4s).	`technologies` (required list of 2-5 names), `reasoning` (required), optional `category` (auto-detects if not provided), optional `aspects` (auto-selected by category), optional `max_results_per_tech` (defaults to 3).
`get_changelog`	NEW! Get release notes and changelogs for package upgrades. Fetches GitHub releases, highlights breaking changes, and provides upgrade recommendations. Answers "What changed in version X → Y?" and "Are there breaking changes?" Perfect for planning dependency updates.	`package` (required name), `reasoning` (required), optional `registry` (npm/pypi/auto, defaults to auto), optional `max_releases` (defaults to 5).
`check_service_status`	NEW! Instantly check if external services are experiencing issues. Covers 25+ popular services (Stripe, AWS, GitHub, OpenAI, Vercel, etc.). Returns operational status, current incidents, and component health. Critical for production debugging - know immediately if the issue is external. Response time < 2s.	`service` (required name, e.g., "stripe", "aws"), `reasoning` (required).

Results are automatically trimmed (default 8 KB) so they stay well within MCP
response expectations. If truncation happens, the text ends with a note reminding the
model that more detail is available on request.

Resources

MCP Resources provide direct data access via URI templates - perfect for quick lookups without tool calls.

Resource URI	Description	Example
`package://{registry}/{name}`	Package metadata (version, downloads, license, dependencies)	`package://npm/express`
`github://{owner}/{repo}`	Repository info (stars, forks, issues, activity)	`github://facebook/react`
`status://{service}`	Service health status	`status://stripe`
`changelog://{registry}/{package}`	Release notes and changelogs	`changelog://npm/typescript`

Prompts

MCP Prompts are reusable message templates that guide AI assistants through common workflows.

Prompt	Arguments	Use Case
`research_package`	`package_name`, `registry`	Evaluate a package before adding it as a dependency
`debug_error`	`error_message`, `language` (optional), `framework` (optional)	Debug an error with context and solutions
`compare_technologies`	`tech1`, `tech2`, `tech3` (optional), `tech4` (optional), `tech5` (optional)	Compare frameworks, databases, or languages
`evaluate_repository`	`owner`, `repo`	Assess a GitHub project's health and activity
`check_service_health`	`services` (comma-separated)	Monitor multiple services at once

Configuration

Environment variables let you adapt the server without touching code:

Variable	Default	Description
`SEARXNG_BASE_URL`	`http://localhost:2288/search`	Endpoint queried by `web_search`.
`SEARXNG_DEFAULT_CATEGORY`	`general`	Category used when none is provided.
`SEARXNG_DEFAULT_RESULTS`	`5`	Default number of search hits.
`SEARXNG_MAX_RESULTS`	`10`	Hard cap on hits per request.
`SEARXNG_CRAWL_MAX_CHARS`	`8000`	Default character budget for `crawl_url`.
`MCP_MAX_RESPONSE_CHARS`	`8000`	Overall response limit applied to every tool reply.
`SEARXNG_MCP_USER_AGENT`	`web-research-assistant/0.1`	User-Agent header for outward HTTP calls.
`PIXABAY_API_KEY`	(empty)	API key for Pixabay image search. Get free key at pixabay.com/api/docs.
`EXA_API_KEY`	(empty)	API key for Exa AI neural search. Get key at dashboard.exa.ai.
`SEARCH_PROVIDER`	`auto`	Search provider: `exa` (Exa only), `searxng` (SearXNG only), or `auto` (try Exa first, fallback to SearXNG).
`MCP_USAGE_LOG`	`~/.config/web-research-assistant/usage.json`	Location for usage analytics data.

Development

The codebase is intentionally modular and organized:

web-research-assistant/
├── src/searxng_mcp/     # Source code
│   ├── config.py        # Configuration and environment
│   ├── search.py        # SearXNG integration
│   ├── exa.py           # Exa AI neural search client
│   ├── crawler.py       # Crawl4AI wrapper
│   ├── images.py        # Pixabay client
│   ├── registry.py      # Package registries (npm, PyPI, crates, Go)
│   ├── github.py        # GitHub API client
│   ├── errors.py        # Error parser (language/framework detection)
│   ├── api_docs.py      # API docs discovery (NO hardcoded URLs)
│   ├── tracking.py      # Usage analytics
│   └── server.py        # MCP server + 13 tools
├── docs/                # Documentation (27 files)
└── [config files]

Each module is well under 400 lines, making the codebase easy to understand and extend.

Usage Analytics

All tools automatically track usage metrics including:

Tool invocation counts and success rates
Response times and performance trends
Common use case patterns (via the reasoning parameter)
Error frequencies and types

Analytics data is stored in ~/.config/web-research-assistant/usage.json and can be analyzed
to optimize tool usage and identify patterns. Each tool requires a reasoning parameter
that helps categorize why tools are being used, enabling better analytics and insights.

Note: As of the latest update, the reasoning parameter is required for all tools (previously optional with defaults). This ensures meaningful analytics data collection.

Documentation

Comprehensive documentation is available in the docs/ directory:

Project Status - Current status, metrics, roadmap
API Docs Implementation - NEW tool documentation
Error Translator Design - Error translator details
Tool Ideas Ranked - Prioritization and progress
SearXNG Configuration - Recommended setup
Quick Start Examples - Usage examples

See the docs README for a complete index.