mcptest
Health Uyari
- License — License: Apache-2.0
- Description — Repository has a description
- Active repo — Last push 0 days ago
- Low visibility — Only 5 GitHub stars
Code Uyari
- fs module — File system access in .github/workflows/release-npm.yml
Permissions Gecti
- Permissions — No dangerous permissions requested
Bu listing icin henuz AI raporu yok.
The test suite your MCP server is missing. Tool, resource, agent-loop, schema-drift, compliance, and security tests against any Model Context Protocol server, in CI on every commit. One YAML suite, deterministic cassette replay, multi-model comparison, and LLM-judge evals. Rust, Apache-2.0.
mcptest
Test your MCP server like you test the rest of your code.
Website · Documentation · Examples · Example servers
mcptest is an open-source CLI for testing Model Context Protocol
servers. You write checks as YAML, point them at any MCP server, and run
them from your terminal, your CI, or your coding agent.
A passing unit-test suite tells you your handler returns the right value.
It tells you nothing about what your server puts on the wire: whether theinitialize handshake completes, whether the tool catalog still says what
you think it says, whether a tools/call over stdio or HTTP returns the
response a client will actually see. mcptest speaks MCP end to end and
checks exactly that. You get a deterministic pass or fail, and when
something breaks, a structured failure that names the assertion, the
payload the server sent, and a one-line repro.
Try it in three commands
curl -fsSL https://download.mcptest.sh/install.sh | sh # or: brew install soapbucket/tap/mcptest
mcptest init # writes a starter suite under tests/
mcptest run # deterministic verdicts, structured failures
mcptest init scaffolds a suite that targets a built-in mock server
(mcptest mock), so the first mcptest run passes offline with no real
server and no network. Swap the command: for your own server when you
are ready. If an MCP client on your machine already knows your server,mcptest init --from-discovered <name> scaffolds against it instead.
How it works
You describe the contract in YAML: a server, a call, and what the
response should look like.
# mcptest.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
api:
command: ["./your-server"] # or: url: https://example.com/mcp
tools:
- name: search returns a result for a known query
server: api
tool: search
args: { query: "anthropic" }
expect:
- target: result.content[0].text
matcher: { contains: "results for" }
mcptest run starts your server (or connects to its URL), performs the
MCP initialize handshake over stdio, streamable HTTP, or legacy SSE,
makes the call a real client would make, and checks the response against
your assertions. It prints one line per check and exits 0 when
everything holds, 1 when something breaks.
push -> mcptest run -> assert on the wire -> exit 0 / exit 1
When the server drifts, the same suite catches it. The failure names the
assertion, shows what it expected against what the server sent, and exits
non-zero so CI can gate on it.
One binary, the whole surface
The same engine covers the things teams otherwise test with one-off
scripts. Each is a YAML block or a subcommand, and each exits with a code
CI understands.
- Tools, resources, prompts. Assert on real responses; catch catalog
and input-schema drift. - The agent loop. Drive a real model across one or more servers and
assert on the trace it produces (tool choice, arguments, tokens, cost). - Spec conformance. Grade a server against a pinned MCP protocol
version (mcptest conformance run). - Schema drift. Diff the tool catalog against a baseline and classify
each change as breaking or not (mcptest diff). - Security. Scan tool, prompt, and resource definitions for
injection, exfiltration, and shadowing, and report as SARIF
(mcptest security). - Offline replay. Record real exchanges to cassettes and replay them
in CI with no keys and no spend.
Test the agent loop, replay it offline
Point one YAML test at one model or a list of them. mcptest lists the
tools on every server you name, sends the prompt to the model with that
catalog attached, dispatches the tool calls the model makes, and records
the conversation. Your assertions resolve against the trace, so the same
suite checks that the model picked the right tool and that the run stayed
inside a token budget.
agents:
- name: weather query routes to get_weather
models: [claude-sonnet-4-5, gpt-5, gemini-2.5-pro]
servers: [weather]
prompt: What is the weather in Sacramento?
expect:
- target: tool_calls[0].name
matcher: { exact: get_weather }
- target: conversation.tokens.total
matcher: { regex: "^[0-9]+$" }
Record once with your provider keys, and each (test, model) pair gets
its own cassette. After that a plain mcptest run replays them in CI,
deterministically, without spending a cent. Add a model identifier tomodels:, re-record, and the report tells you which assertion broke for
which model.
Providers covered today: Anthropic, OpenAI (including the o-series),
Google Gemini, Mistral, plus any OpenAI-compatible endpoint (Azure,
OpenRouter, vLLM, llama.cpp, LiteLLM, Together, Groq, Bedrock-fronted
Anthropic) through a named providers: block. Sweep a whole suite across
models with mcptest run --models a,b,c and get a test-by-model grid.
Background is in docs/models.md.
Use mcptest from your coding agent
mcptest ships an MCP server of its own, so Claude Code, Cursor, or any
MCP-capable agent can run the full testing loop. Two commands hand it the
keys:
mcptest mcp-server --install --enable-writes # the agent-facing verbs
mcptest skill --install # the packaged skill
The agent scaffolds a validated suite from the server's real catalog,
sharpens the generic checks against observed responses, runs the suite,
and reads back a failure that already carries the assertion, the actual
value, and a one-line repro. The agent brings the judgment; mcptest
supplies the deterministic ground truth it cannot invent, and the YAML it
leaves behind is the diffable audit trail a human reviews. See
the agent interface for the verb reference
and the model-facing --reporter agent output.
Run it in CI
The suite is a diffable YAML file you run on every commit. Reporters cover
the formats CI already understands, and a single run writes
machine-readable artifacts CI can store.
# .github/workflows/mcptest.yml
- name: Install mcptest
run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=v1.1.0 sh
- name: Run mcptest
run: mcptest run --reporter junit --output mcptest.junit.xml
Pick the reporter with --reporter: pretty (default), minimal,json, junit, md, html, sarif, gitlab, ndjson, tap,matrix, or quiet. Capture the JSON envelope once, then re-render it
into any other format with mcptest report --format, with no second run
and no second API call.
Why mcptest
Inspectors and one-off scripts tell you a server looked right once. A
general eval framework grades the model that calls your tools, not the
server on the other side of the call. mcptest is the part you can commit:
one binary that checks the protocol contract, the behavior, the agent
loop, schema drift, and tool-definition security, and turns each into a
stable exit code. It is a single static binary with no telemetry and no
auto-update, it is Apache-2.0, and it bakes a CycloneDX Software Bill of
Materials into the binary so you can read the dependency list from the
copy you already have.
mcptest sbom # the embedded CycloneDX SBOM
mcptest sbom --verify # re-hash it to catch tampering
Every release is Sigstore-signed and carries SLSA L3 build provenance.
The full verification walkthrough lives at
mcptest.sh/trust.
Install
Homebrew (macOS, Linux):
brew install soapbucket/tap/mcptest
curl installer (macOS, Linux, Apple Silicon and arm64 included):
curl -fsSL https://download.mcptest.sh/install.sh | sh
The installer detects your platform, downloads the signed release tarball
from download.mcptest.sh, verifies its sha256 against the sums file, and
drops mcptest into ~/.local/bin (or /usr/local/bin under sudo).
Inspect it first with curl -fsSL https://download.mcptest.sh/install.sh | less.
Docker:
docker run --rm -v "$PWD":/work -w /work soapbucket/mcptest:latest run
Documentation
Full documentation lives under docs/. Start here:
- Getting started: install to first passing
test in about five minutes. - What is mcptest: the one-page definition.
- Concepts: the mental model.
- YAML reference: every field, every matcher.
- CLI reference: every subcommand, every flag.
- Examples: runnable suites across the whole
surface, plus mcptest-examples
for complete end-to-end suites against ten popular servers.
SDKs drive mcptest from your own test runner: Python (pytest), TypeScript
(vitest, jest, mocha, node:test), Go, Rust (proc-macro), .NET (xUnit),
and JVM (JUnit 5). See docs/sdks.md.
Build from source
cargo build --release
./target/release/mcptest --help
./scripts/check.sh # the full gate: fmt + clippy + doc + build + test
License
Apache-2.0. See LICENSE and NOTICE.
Copyright 2026 Soap Bucket LLC and the mcptest contributors. Soap Bucket
LLC at soapbucket.com.
Links
- Documentation:
docs/index.md - Releases: github.com/soapbucket/mcptest/releases
- Issues and roadmap: github.com/soapbucket/mcptest/issues
- X (Twitter): @soapbucket
Yorumlar (0)
Yorum birakmak icin giris yap.
Yorum birakSonuc bulunamadi