mcp-benchmark
Health Uyari
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Gecti
- Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
Head-to-head benchmark comparing the official MCP to the MCP auto-created by Hintas.
MCP Benchmark
Official platform MCPs vs MCPs by Hintas, running head-to-head on the same prompts against mirrored workspaces.
The main purpose of this benchmark is to compare the official MCPs offered by the softwares vs the MCPs built by Hintas for those softwares. The test covers popular softwares like Slack, Notion, and Gmail.
For each platform, the same prompts run under identical conditions, once against the platform's official MCP (baseline) and then against the MCP provided by Hintas (variant). The benchmark measures pass rate, token usage, tool-call count, wall time, and failure modes, then reports baseline minus variant deltas across the prompt suite.
Experiments and Results
Each platform was run head-to-head over a fixed prompt suite (48 prompts for Slack, 58 for Notion, 42 for Gmail), with the platform's official MCP and the MCP built for them by Hintas answering the same prompts against mirrored workspaces. The tables below summarize the per-dimension verdicts, and full per-prompt breakdowns live in each platform's report.
Slack
| Metric | Slack MCP - Official | Slack MCP - Hintas | Δ (Hintas − Official) |
|---|---|---|---|
| Success rate | 23% | 77% | +54.2 pp |
| Speed | 16.9 s | 44.2 s | +27.2 s |
| Tokens | 4,132 | 11,684 | +7,552 |
Full report: experiments/slack/results.md
Notion
| Metric | Notion MCP - Official | Notion MCP - Hintas | Δ (Hintas − Official) |
|---|---|---|---|
| Success rate | 68% | 80% | +12.5 pp |
| Speed | 45.4 s | 48.2 s | +2.8 s |
| Tokens | 78,172 | 74,411 | −3,761 |
Full report: experiments/notion/results.md
Gmail
| Metric | Gmail MCP - Official | Gmail MCP - Hintas | Δ (Hintas − Official) |
|---|---|---|---|
| Success rate | 50% | 71% | +21.4 pp |
| Speed | 29.5 s | 56.9 s | +27.4 s |
| Tokens | 15,267 | 39,335 | +24,068 |
Full report: experiments/gmail/results.md
What gets measured
- Pass rate (per prompt, scored by an analyzer Claude session)
- Total input/output tokens
- Tool-call count
- Wall-clock time
- Failure modes (categorized)
Quick start
Prerequisites:
uv(brew install uv)claudeCLI on$PATH(npm i -g @anthropic-ai/claude-code)uv syncto install Python deps
Run a benchmark:
uv run benchmark run --platform slack --stack slack # baseline
uv run benchmark run --platform slack --stack hintas # variant
Tokens are read from experiments/<name>/.env. See the platform README for the required variables.
Run uv run benchmark --help for the full subcommand and flag list.
Implementation
For the harness internals (pipeline subcommands, output layout, manifest schema, and how to add a new platform), see IMPLEMENTATION.md.
Built by Hintas
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi