{
"mcpServers": {
"pdf-ocr-mcpserver": {
"command": "<see-readme>",
"args": []
}
}
}No install config available. Check the server's README for setup instructions.
Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
MCP server for class 12 physics students.
Is it safe?
No package registry to scan.
No authentication — any process on your machine can connect.
License not specified.
Is it maintained?
Last commit 13 days ago. 1 stars.
Will it work with my client?
Transport: stdio. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
No automated test available for this server. Check the GitHub README for setup instructions.
No known vulnerabilities.
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationHave you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
A Model Context Protocol server for searching and analyzing arXiv papers
📖 MCP server for fetch deepwiki.com and get latest knowledge in Cursor and other Code Editors
Model Context Protocol (MCP) Server to connect your AI with any MediaWiki
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.
MCP Security Weekly
Get CVE alerts and security updates for Pdf Ocr Mcpserver and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
A production-oriented Model Context Protocol (MCP) server for working with OCR text extracted from PDF page images. The repository includes an OCR pipeline for generating page text files and an HTTP-based MCP server built with Express and the official MCP TypeScript SDK.
This project is designed for local corpora that have already been split into page images. It exposes that OCR corpus through MCP tools, resources, and prompts so MCP clients can search, inspect, and summarize page content efficiently.
.txt files with Tesseract.
|-- pages/ # Source page images (.png)
|-- texts/ # OCR output text files (.txt)
|-- src/
| |-- scripts/
| | `-- index.ts # OCR generation script
| `-- tools/
| |-- http-server.ts # Express + HTTP MCP server entrypoint
| |-- mcp-server.ts # MCP tools/resources/prompts registration
| |-- text-repository.ts
| `-- tools.ts # Compatibility entrypoint
|-- package.json
`-- tsconfig.json
The codebase is split into three clean layers:
TextRepository
Handles safe file resolution, page listing, cached reads, range reads, and text search.
createTextMcpServer
Defines the MCP contract exposed to clients, including tools, resource templates, and prompts.
http-server
Hosts the MCP server over Express using Streamable HTTP transport, manages sessions, and exposes operational endpoints.
PATHThe OCR script currently invokes tesseract directly, so the binary must be accessible from your shell.
npm install
If your page images already exist in pages/, generate text files with:
npm run generate:textfiles
This will:
.png files from pages/.txt files into texts/Start the HTTP-based MCP server with:
npm run start
The server defaults to:
http://127.0.0.1:3000/mcphttp://127.0.0.1:3000/healthzhttp://127.0.0.1:3000/readyzThe server supports the following environment variables:
| Variable | Default | Description |
| ------------------- | ----------- | ------------------------------------------ |
| MCP_HOST | 127.0.0.1 | Host interface to bind the server to |
| MCP_PORT | 3000 | Port used by the HTTP MCP server |
| MCP_BODY_LIMIT | 1mb | Maximum JSON request body size |
| MCP_ALLOWED_HOSTS | empty | Optional comma-separated host allowlist |
| MCP_PRELOAD_CACHE | true | Preload text content into cache on startup |
Example:
MCP_HOST=127.0.0.1
MCP_PORT=3000
MCP_BODY_LIMIT=1mb
MCP_PRELOAD_CACHE=true
list_text_pages
Lists available OCR text files with pagination support.
read_text_page
Reads a single OCR page by file name and can optionally truncate the response.
read_text_range
Reads a bounded character range from a page for more efficient retrieval.
search_text_pages
Searches the corpus and returns contextual snippets for matching pages.
get_corpus_stats
Returns repository and cache metrics for diagnostic