A standardized testing harness for MCP servers and agent workflows
{
"mcpServers": {
"eval-runner": {
"args": [
"-y",
"mcp-eval-runner@latest"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
A standardized testing harness for MCP servers and agent workflows
Is it safe?
No package registry to scan.
No authentication — any process on your machine can connect.
License not specified.
Is it maintained?
Last commit 3 days ago.
Will it work with my client?
Transport: stdio, sse, http. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
No automated test available for this server. Check the GitHub README for setup instructions.
No known vulnerabilities.
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationHave you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Persistent memory using a knowledge graph
Privacy-first. MCP is the protocol for tool access. We're the virtualization layer for context.
Pre-build reality check. Scans GitHub, HN, npm, PyPI, Product Hunt — returns 0-100 signal.
Monitor browser logs directly from Cursor and other MCP compatible IDEs.
MCP Security Weekly
Get CVE alerts and security updates for io.github.dbsectrainer/mcp-eval-runner and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
npm mcp-eval-runner package
A standardized testing harness for MCP servers and agent workflows. Define test cases as YAML fixtures (steps → expected tool calls → expected outputs), run regression suites directly from your MCP client, and get pass/fail results with diffs — without leaving Claude Code or Cursor.
Tool reference | Configuration | Fixture format | Contributing | Troubleshooting | Design principles
expected_output without a server.output_contains, output_not_contains, output_equals, output_matches, schema_match, tool_called, and latency_under per step.{{steps.<step_id>.output}}.Add the following config to your MCP client:
{
"mcpServers": {
"eval-runner": {
"command": "npx",
"args": ["-y", "mcp-eval-runner@latest"]
}
}
}
By default, eval fixtures are loaded from ./evals/ in the current working directory. To use a different path:
{
"mcpServers": {
"eval-runner": {
"command": "npx",
"args": ["-y", "mcp-eval-runner@latest", "--fixtures=~/my-project/evals"]
}
}
}
Amp · Claude Code · Cline · Cursor · VS Code · Windsurf · Zed
Create a file at evals/smoke.yaml. Use live mode (recommended) by including a server block:
name: smoke
description: "Verify eval runner itself is working"
server:
command: node
args: ["dist/index.js"]
steps:
- id: list_check
description: "List available test cases"
tool: list_cases
input: {}
expect:
output_contains: "smoke"
Then enter the following in your MCP client:
Run the eval suite.
Your client should return a pass/fail result for the smoke test.
Fixtures are YAML (or JSON) files placed in the fixtures directory. Each file defines one test case.
| Field | Required | Description |
| ------------- | -------- | ----------------------------------------------------------------------------------------- |
| name | Yes | Unique name for the test case |
| description | No | Human-readable description |
| server | No | Server config — if present, runs in live mode; if absent, runs in simulation mode |
| steps | Yes | Array of steps to execute |
server block (live mode)server:
command: node # executable to spawn
args: ["dist/index.js"] # arguments
env: # optional environment variables
MY_VAR: "value"
When server is present the eval runner spawns the server as a child process, connects via MCP stdio transport, and calls each step's tool against the live server.
steps arrayEach step has the following fields:
| Field | Required | Description | | ----------------- | -------- | ----------------------------------------