Production-ready RAG + MCP demo: eval-in-CI merge gate, Langfuse traces, structure-aware chunking.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"io-github-kimsb2429-internal-knowledge-base": {
"command": "<see-readme>",
"args": []
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Production-ready RAG + MCP demo: eval-in-CI merge gate, Langfuse traces, structure-aware chunking.
No automated test available for this server. Check the GitHub README for setup instructions.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationBe the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in other
Persistent memory using a knowledge graph
Privacy-first. MCP is the protocol for tool access. We're the virtualization layer for context.
Make HTTP requests and fetch web content
Read, write, and manage files on the local filesystem
MCP Security Weekly
Get CVE alerts and security updates for io.github.kimsb2429/internal-knowledge-base and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
A public Retrieval-Augmented Generation pipeline exposed as an MCP server. Sample content from Veterans Affairs education manuals.
The repo implements evaluation, observability, and structure-aware ingestion. Cost/latency tuning, tenant-level access control, and other production concerns are discussed in the article linked below.
📖 Full writeup on Medium: Enterprise Internal Knowledge Base RAG MCP: POC-to-Production
RAG demos tend to focus on the quality of the retrieval pipeline, without recognizing that production RAG fails on the next ten steps: prompt or model changes that pass code review but tank answer quality, cost and latency drift that cannot be traced to specific queries, cross-tenant leakage that only surfaces in audit. This repo shows what catching them looks like in practice.
The corpus is public (VA Education manuals — 238 documents, 9,000+ chunks) so anyone can clone, run, and adapt the pipeline.
git clone https://github.com/kimsb2429/internal-knowledge-base
cd internal-knowledge-base
# 1. Start Postgres + pgvector
docker compose up -d
# 2. Python env + dependencies
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 3. Restore corpus fixture (~2 min — 238 docs + 9k chunks pre-embedded)
docker exec -i ikb_pgvector pg_restore -U ikb -d ikb < evals/fixture_v1.dump
# 4. Smoke-test the MCP server
python scripts/test_mcp_server.py # 7/7 tests pass
# 5. Start the MCP server (stdio transport)
python scripts/mcp_server.py
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"ikb": {
"command": "python",
"args": ["/absolute/path/to/internal-knowledge-base/scripts/mcp_server.py"]
}
}
}
Then ask Claude things like "What RPO handles GI Bill claims in Texas?" — the MCP server returns ranked chunks with citations.
Ingestion (one-time per corpus):
graph LR
A[KnowVA crawler<br/>HTML + PDF] --> B[Source-specific<br/>preprocessor]
B --> C[Structure-aware<br/>chunker]
C --> D[mxbai-embed-large<br/>local, 1024-dim]
D --> E[(pgvector)]
F[Anthropic Contextual<br/>Retrieval] -.-> E
E -.-> F
style E fill:#e1f5fe
Query (per MCP tool call):
graph LR
A[Claude Desktop<br/>MCP client] --> B[FastMCP server]
B --> C[pgvector top-K]
C --> D[Reranker<br/>mxbai or FlashRank]
D --> E[Claude Sonnet<br/>generation]
E --> A
E --> F[Langfuse trace]
style F fill:#fff9c4
Stack:
content_tsv GIN index for hybrid-readyquery), Resources (document://{source_id}), Prompts (cite_from_chunks)Full 110-question golden set, contextualized chunks + reranker:
| Metric | Score |
|---|---|
| Faithfulness | 0.95 |
| Answer Relevance | 0.91 |
| Context Precision | 0.61 |
| Context Recall | 0.52 |
| Context Relevance | 0.56 |
🔗 Live Langfuse trace (public, no login).
Notable result: Anthropic's Contextual Retrieval pattern produced modest lift on top of reranking (+4.8pp AnsRel, +4.1pp CtxPrec) at this scale — well short of the +35% recall their published numbers suggested. Rep