web-research

mcp
Guvenlik Denetimi
Uyari
Health Uyari
  • No license — Repository has no license file
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Gecti
  • Code scan — Scanned 2 files during light audit, no dangerous patterns found
Permissions Gecti
  • Permissions — No dangerous permissions requested

Bu listing icin henuz AI raporu yok.

SUMMARY

Token-efficient web research for AI agents; tinyfish search + Groq summarisation, 99% fewer tokens than raw HTML

README.md

web-research

Token-efficient web research for AI agents.

Combines TinyFish web search with configurable summarization providers, currently Groq and GitHub Copilot CLI. Use it as a CLI or as an MCP server.

Instead of dumping raw HTML, scripts, navbars, ads, and cookie banners into model context, it returns clean focused summaries with sources.

AI Agent / MCP Client
        ↓
wr-mcp (or wr CLI)
        ↓
search + fetch + clean + summarize + cache
        ↓
clean context with sources

Why

  • Native WebSearch dumps raw snippets into Claude context (expensive)
  • Native WebFetch sends full HTML to Claude (very expensive)
  • wr offloads both to external APIs, returning only the relevant summary

Install

Prerequisites: Go 1.25+

git clone https://github.com/mrvarmazyar/web-research
cd web-research
go build -o ~/.local/bin/wr ./cmd/wr
go build -o ~/.local/bin/wr-mcp ./cmd/wr-mcp

Make sure ~/.local/bin is in your PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile

Configuration

Groq

Add to ~/.bash_profile or ~/.zshrc:

export TINYFISH_API_KEY="..."   # https://agent.tinyfish.ai → Settings → API Keys
export WR_SUMMARIZER_PROVIDER="groq"      # optional: groq (default) or copilot
export WR_SUMMARIZER_MODEL="llama-3.1-8b-instant"  # optional: override provider default model
export GROQ_API_KEY="..."       # required when provider=groq
export WR_CACHE_DAYS=7          # optional, cache TTL in days (default: 7)

Copilot CLI (experimental)

Install a Copilot CLI that exposes a copilot binary, then verify:

copilot --help
copilot login
export TINYFISH_API_KEY="..."   # https://agent.tinyfish.ai → Settings → API Keys
export WR_SUMMARIZER_PROVIDER="copilot"
export WR_SUMMARIZER_MODEL="gpt-5-mini"  # optional; COPILOT_MODEL also works
export COPILOT_MODEL="gpt-5-mini"        # optional
# Authenticate via the standalone `copilot login` or export one of:
# export COPILOT_GITHUB_TOKEN="..."
# export GH_TOKEN="..."
# export GITHUB_TOKEN="..."
export WR_CACHE_DAYS=7

Verify setup:

wr setup

Expected output:

OK       TINYFISH_API_KEY
OK       summarizer provider = groq
OK       GROQ_API_KEY
OK       summarizer model = llama-3.1-8b-instant

All configured. Try: wr research "your query here"

Usage

wr research: full pipeline (recommended)

Searches, fetches top 3 results, summarizes each with the configured provider.

wr research "stripe webhook idempotency next.js app router"
wr research "tailwind v4 migration breaking changes"
wr research --provider copilot --model gpt-5-mini "next-intl v4 RTL Arabic configuration"
wr research "next-intl v4 RTL Arabic configuration"

wr search: search only

Returns titles, snippets, and URLs. Useful when you just need to find the right page.

wr search "golang http middleware pattern"
wr search "postgres JSONB index performance"

wr fetch: fetch and summarize a URL

Fetches a URL, converts HTML to markdown, and summarizes with the selected provider/model.

wr fetch "https://nextjs.org/docs/app/building-your-application/routing/middleware" "how to match locale prefixes"
wr fetch --provider copilot --model gpt-5-mini "https://docs.stripe.com/webhooks" "signature verification best practices"
wr fetch "https://docs.stripe.com/webhooks" "signature verification best practices"

Token Reduction Benchmark

wr bench
URL                                           Raw HTML      Raw MD   wr output   vs HTML     vs MD
────────────────────────────────────────────────────────────────────────────────────────────────────
https://nextjs.org/docs/...                     1.2 MB     29.4 KB       760 B     99.9%     97.5%
https://docs.stripe.com/webhooks                1.4 MB     39.3 KB      11.8 KB     99.2%     70.0%
https://next-intl.dev/docs/...                202.5 KB     24.2 KB      11.8 KB     94.2%     51.3%
────────────────────────────────────────────────────────────────────────────────────────────────────
TOTAL                                           2.9 MB     92.8 KB      24.3 KB     99.2%     73.8%

Token estimates (chars / 4):

Method Tokens Reduction vs HTML
Raw HTML 747,388
Raw Markdown 23,767 96.8%
wr output 6,221 99.2%

How It Works

wr research "query"
    │
    ├── TinyFish search API
    │   └── returns top 10 results (title, snippet, URL)
    │
    ├── For each of top 3 results:
    │   ├── Check local cache (~/.cache/web-research/)
    │   ├── If miss: fetch URL with net/http
    │   │   └── convert HTML → markdown (html-to-markdown)
    │   ├── Store in cache
    │   └── Summarizer provider (Groq or Copilot CLI)
    │       └── summarize markdown, extract relevant info
    │
    └── Print summaries

Cache is keyed by URL SHA-256, stored at ~/.cache/web-research/, expires after WR_CACHE_DAYS days.

Fallbacks:

  • If GROQ_API_KEY is unset while using groq, returns truncated raw markdown instead of a summary.
  • If Copilot auth/CLI is unavailable while using copilot, commands fall back to truncated/raw content.

Setup note: wr setup --provider copilot can confirm that the copilot CLI is installed, but if you rely on a prior copilot login session it can only report that auth may be available; the CLI does not expose a documented non-interactive auth-status command.

MCP Server

wr-mcp exposes web-research as MCP tools for Claude Desktop, Cursor, Codex, and any MCP-compatible client.

Tool Purpose
web_search Search the web, return clean results
web_fetch Fetch one URL and summarize it (provider and model optional)
web_research Search + fetch top pages + synthesized answer with sources (provider and model optional)

See docs/mcp.md for client config examples (Claude Desktop, Cursor, generic).

AI Agent Integration

Claude Code

This repo includes a SKILL.md for Claude Code. Symlink it into your skills directory:

ln -s ~/web-research ~/.claude/skills/web-research

Then add to ~/.claude/CLAUDE.md:

## Web Research

For all web searches and URL fetching, use `wr`:
- Search: `wr search "<query>"`
- Fetch: `wr fetch "<url>" "<what to extract>"`
- Full research: `wr research "<query>"`

Skip for: GitHub repos/issues (use gh CLI), Jira/Confluence (use MCP tools), internal URLs.

Claude will automatically use wr instead of native WebSearch/WebFetch.

OpenAI Codex

Create ~/.codex/web-research.md:

# Web Research

For all web searches and URL fetching, use the `wr` CLI instead of built-in search tools.

## Commands

\`\`\`bash
wr research "<query>"       # search + fetch top 3 + summarize (preferred)
wr search "<query>"         # search only → URLs + snippets
wr fetch "<url>" "<prompt>" # fetch single URL + summarize
\`\`\`

## When to Use
- Any research task, finding docs, "search for X", "look up X"
- Fetching any public documentation URL

## When to Skip
- GitHub repos/issues/PRs → use `gh` CLI
- Internal/authenticated URLs → use curl directly

Then reference it from ~/.codex/AGENTS.md:

@/Users/yourname/.codex/web-research.md

Dependencies

Package Purpose
html-to-markdown HTML → Markdown conversion
TinyFish API Live browser-rendered web search
Groq API Groq-hosted summarization
GitHub Copilot CLI Optional local summarizer backend

License

MIT

Yorumlar (0)

Sonuc bulunamadi