Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"pdf-mcp": {
"command": "pdf-mcp"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
A Model Context Protocol (MCP) server that enables AI agents to read, search, and extract content from PDF files. Built with Python and PyMuPDF, with SQLite-based caching for persistence across server restarts.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
npx -y 'pdf-mcp' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
Checked pdf-mcp against OSV.dev.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in productivity / developer-tools
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
Read, write, and manage files on the local filesystem
A Model Context Protocol (MCP) server and CLI that provides tools for agent use when working on iOS and macOS projects.
MCP Security Weekly
Get CVE alerts and security updates for io.github.jztan/pdf-mcp and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Surgical PDF access for AI agents — search, read, and extract without flooding context.
An MCP server that lets Claude Code and other AI agents search a PDF by meaning or keyword, read only the pages that matter, and cleanly pull out tables, images, and scanned text — even from multi-column and Japanese layouts.
mcp-name: io.github.jztan/pdf-mcp
Drop in any PDF and watch an agent skim it, search it, and read only the pages that matter — using a fraction of the tokens. 100% client-side, no install required.
| Without pdf-mcp | With pdf-mcp | |
|---|---|---|
| Large PDFs | Context overflow | Chunked reading |
| Token budgeting | Guess and overflow | Estimated tokens before reading |
| Finding content | Load everything | Hybrid search (BM25 keyword + semantic) |
| Tables | Lost in raw text | Extracted and inlined per page |
| Multi-column PDFs | Columns interleaved in extracted text | Column-aware reading order (pdf-mcp[multicolumn]) |
| Vertical scripts (Japanese) | Columns scrambled / glyph soup | Geometric reorder of vertical text (tategaki / 縦書き); CJK keyword search works on unspaced Japanese/Chinese/Korean text via a char-split FTS index |
| Images | Ignored | Extracted as PNG files |
| Repeated access | Re-parse every time | SQLite cache |
| Scanned PDFs | No text extracted | OCR via Tesseract, parallelized across pages (pdf_read_pages(ocr=True)) |
| Visual content | Must describe in words | Render page as image (pdf_render_pages) |
| Hidden / injected text | Silently ingested as if a human vetted it | Flagged as untrusted — hidden-text detection (content_trust=True) |
| Tool design | Single monolithic tool | 9 specialized tools |