mcp-benchmark

mcp
Security Audit
Warn
Health Warn
  • License — License: Apache-2.0
  • Description — Repository has a description
  • Active repo — Last push 0 days ago
  • Low visibility — Only 5 GitHub stars
Code Pass
  • Code scan — Scanned 12 files during light audit, no dangerous patterns found
Permissions Pass
  • Permissions — No dangerous permissions requested

No AI report is available for this listing yet.

SUMMARY

Head-to-head benchmark comparing the official MCP to the MCP auto-created by Hintas.

README.md

MCP Benchmark

Hintas

Official platform MCPs vs MCPs by Hintas, running head-to-head on the same prompts against mirrored workspaces.

License: Apache 2.0 Python ≥3.10 Platforms: Slack | Notion | Gmail hintas.com


The main purpose of this benchmark is to compare the official MCPs offered by the softwares vs the MCPs built by Hintas for those softwares. The test covers popular softwares like Slack, Notion, and Gmail.

For each platform, the same prompts run under identical conditions, once against the platform's official MCP (baseline) and then against the MCP provided by Hintas (variant). The benchmark measures pass rate, token usage, tool-call count, wall time, and failure modes, then reports baseline minus variant deltas across the prompt suite.

Experiments and Results

Each platform was run head-to-head over a fixed prompt suite (48 prompts for Slack, 58 for Notion, 42 for Gmail), with the platform's official MCP and the MCP built for them by Hintas answering the same prompts against mirrored workspaces. The tables below summarize the per-dimension verdicts, and full per-prompt breakdowns live in each platform's report.

Slack

Metric Slack MCP - Official Slack MCP - Hintas Δ (Hintas − Official)
Success rate 23% 77% +54.2 pp
Speed 16.9 s 44.2 s +27.2 s
Tokens 4,132 11,684 +7,552

Full report: experiments/slack/results.md

Notion

Metric Notion MCP - Official Notion MCP - Hintas Δ (Hintas − Official)
Success rate 68% 80% +12.5 pp
Speed 45.4 s 48.2 s +2.8 s
Tokens 78,172 74,411 −3,761

Full report: experiments/notion/results.md

Gmail

Metric Gmail MCP - Official Gmail MCP - Hintas Δ (Hintas − Official)
Success rate 50% 71% +21.4 pp
Speed 29.5 s 56.9 s +27.4 s
Tokens 15,267 39,335 +24,068

Full report: experiments/gmail/results.md

What gets measured

  • Pass rate (per prompt, scored by an analyzer Claude session)
  • Total input/output tokens
  • Tool-call count
  • Wall-clock time
  • Failure modes (categorized)

Quick start

Prerequisites:

  • uv (brew install uv)
  • claude CLI on $PATH (npm i -g @anthropic-ai/claude-code)
  • uv sync to install Python deps

Run a benchmark:

uv run benchmark run --platform slack --stack slack    # baseline
uv run benchmark run --platform slack --stack hintas   # variant

Tokens are read from experiments/<name>/.env. See the platform README for the required variables.

Run uv run benchmark --help for the full subcommand and flag list.

Implementation

For the harness internals (pipeline subcommands, output layout, manifest schema, and how to add a new platform), see IMPLEMENTATION.md.


Built by Hintas

Reviews (0)

No results found