Production-ready RAG + MCP demo: eval-in-CI merge gate, Langfuse traces, structure-aware chunking.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"ikb": {
"args": [
"/absolute/path/to/internal-knowledge-base/scripts/mcp_server.py"
],
"command": "python"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
A public Retrieval-Augmented Generation pipeline exposed as an MCP server. Sample content from Veterans Affairs education manuals.
No automated test available for this server. Check the GitHub README for setup instructions.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
Click any tool to inspect its schema.
documentAccess to documents in the knowledge base by source ID
document://{source_id}
cite_from_chunksPrompt for generating answers with citations from retrieved chunks
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml / search
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
Web and local search using Brave Search API
Production ready MCP server with real-time search, extract, map & crawl.
MCP Security Weekly
Get CVE alerts and security updates for io.github.kimsb2429/internal-knowledge-base and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
A public Retrieval-Augmented Generation pipeline exposed as an MCP server. Sample content from Veterans Affairs education manuals.
The repo implements evaluation, observability, and structure-aware ingestion. Cost/latency tuning, tenant-level access control, and other production concerns are discussed in the article linked below.
📖 Full writeup in Towards AI: Enterprise Internal Knowledge Base RAG MCP: POC-to-Production
RAG demos tend to focus on the quality of the retrieval pipeline, without recognizing that production RAG fails on the next ten steps: prompt or model changes that pass code review but tank answer quality, cost and latency drift that cannot be traced to specific queries, cross-tenant leakage that only surfaces in audit. This repo shows what catching them looks like in practice.
The corpus is public (VA Education manuals — 238 documents, 9,000+ chunks) so anyone can clone, run, and adapt the pipeline.
git clone https://github.com/kimsb2429/internal-knowledge-base
cd internal-knowledge-base
# 1. Start Postgres + pgvector
docker compose up -d
# 2. Python env + dependencies
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Restore corpus fixture (~2 min — 238 docs + 9k chunks pre-embedded)
docker exec -i ikb_pgvector pg_restore -U ikb -d ikb < evals/fixture_v1.dump
# 4. Smoke-test the MCP server
python scripts/test_mcp_server.py # 7/7 tests pass
# 5. Start the MCP server (stdio transport)
python scripts/mcp_server.py
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"ikb": {
"command": "python",
"args": ["/absolute/path/to/internal-knowledge-base/scripts/mcp_server.py"]
}
}
}
Then ask Claude things like "What RPO handles GI Bill claims in Texas?" — the MCP server returns ranked chunks with citations.
Ingestion (one-time per corpus):
graph LR
A[KnowVA crawler<br/>HTML + PDF] --> B[Source-specific<br/>preprocessor]
B --> C[Structure-aware<br/>chunker]
C --> D[mxbai-embed-large<br/>local, 1024-dim]
D --> E[(pgvector)]
F[Anthropic Contextual<br/>Retrieval] -.-> E
E -.-> F
style E fill:#e1f5fe
Query (per MCP tool call):
graph LR
A[Claude Desktop<br/>MCP client] --> B[FastMCP server]
B --> C[pgvector top-K]
C --> D[Reranker<br/>mxbai or FlashRank]
D --> E[Claude Sonnet<br/>generation]
E --> A
E --> F[Langfuse trace]
style F fill:#fff9c4
Stack:
content_tsv GIN index for hybrid-readyquery), Resources (document://{source_id}), Prompts (cite_from_chunks)Full 110-question golden set, contextualized chunks + reranker:
| Metric | Score |
|---|---|
| Faithfulness | 0.95 |
| Answer Relevance | 0.91 |
| Context Precision | 0.61 |
| Context Recall | 0.52 |
| Context Relevance | 0.56 |
🔗 Live Langfuse trace (public, no login).
Notable result: Anthropic's Contextual Retrieval pattern produced modest lift on top of reranking (+4.8pp AnsRel, +4.1pp CtxPrec) at this scale — well short of the +35% recall their published number