An MCP server to judge an LLM's responses
{
"mcpServers": {
"llm-as-judge": {
"env": {
"GEMINI_API_KEY": "your-api-key"
},
"args": [
"-y",
"@jgsheppa/llm-as-judge-mcp-server",
"stdio",
"-p",
"gemini"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
The LLM as Judge MCP Server enables users of LLMs to get a second opinion for an LLM's response. The MCP tool sends the user's question, the LLM's response, and an optional focus for the evaluation to a second LLM for evaluation. This second opinion can be used to then improve an LLM's response and give users another perspective regarding the original LLM's response.
Is it safe?
No known CVEs for @jgsheppa/llm-as-judge-mcp-server.
No authentication — any process on your machine can connect.
MIT. View license →
Is it maintained?
Last commit 140 days ago. 11 weekly downloads.
Will it work with my client?
Transport: stdio. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
Context cost
1 tool. ~200 tokens (0.1% of 200K).
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
npx -y '@jgsheppa/llm-as-judge-mcp-server' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
evaluate_llm_responseGet a second opinion on an LLM's response by sending the user's question, the LLM's response, and an optional focus for evaluation to a second LLM for assessment
default_judge_promptDefault prompt for an LLM to evaluate another LLM's response, customizable via --prompt-path argument
No known vulnerabilities.
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Dynamic problem-solving through sequential thought chains
A Model Context Protocol server for searching and analyzing arXiv papers
An open-source AI agent that brings the power of Gemini directly into your terminal.
The official Python SDK for Model Context Protocol servers and clients
MCP Security Weekly
Get CVE alerts and security updates for Llm As Judge Mcp Server and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
The LLM as Judge MCP Server enables users of LLMs to get a second opinion for an LLM's response. The MCP tool sends the user's question, the LLM's response, and an optional focus for the evaluation to a second LLM for evaluation. This second opinion can be used to then improve an LLM's response and give users another perspective regarding the original LLM's response.
You can run this MCP server using node with the following command. This can also be used in MCP configuration files, which can be seen in the setup section below.
npx -y @jgsheppa/llm-as-judge-mcp-server
You can also download a binary which matches your machine's architecture to start using llm-as-judge-mcp-server.
To set up the llm-as-judge-mcp-server, you can define the MCP server in a JSON file wherever your LLM has access to MCP servers.
{
"mcpServers": {
"llm-as-judge": {
"command": "npx",
"args": [
"-y",
"@jgsheppa/llm-as-judge-mcp-server",
"stdio",
"-p",
"gemini",
],
"env": {
"GEMINI_API_KEY": "your-api-key",
}
},
}
}
To customize your model and prompt, you configuration would look like this:
{
"mcpServers": {
"llm-as-judge": {
"command": "npx",
"args": [
"-y",
"@jgsheppa/llm-as-judge-mcp-server",
"stdio",
"-p",
"gemini",
"-m",
"gemini-2.5-flash",
"--prompt-path",
"/Users/firstlast/Desktop/PROMPT.md"
],
"env": {
"GEMINI_API_KEY": "your-api-key",
}
},
}
}
To define your provider and model, you can pass them as arguments to the MCP server. And while there is a default prompt for the LLM as judge, you can also pass your own custom prompt by providing the full filepath to the --prompt-path argument.
Currently three providers are available for this MCP server:
More providers can and will be added in the future.
Any models offered by the previously mentioned providers can be used as LLM judges. It is up to you to decide which model works best for your use-case, but a bigger, frontier model is not necessarily the best option to evaluate the response of another frontier model. Smaller, less expensive models can be great options as well.
To try and improve the experience of using this MCP server, there are default models for each provider. The models were chosen because they are the most cost-efficient. These might change in the future, but since they are customizable, these seem like fine choices for now.
| Provider | Default Model | |-----------|-------------------| | Anthropic | claude-haiku-4-5 | | Gemini | gemini-2.5-flash | | Ollama | gemma3:4b | | OpenAI | gpt-5-mini |
This MCP server offers a default prompt for an LLM to evaluate another LLM's response, but you can also provide your own prompt for further customization.