A standalone MCP server that provides on-device Vision Framework access for PDF and image text extraction.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"vision-mcp": {
"command": "<see-readme>",
"args": []
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
A standalone MCP server that provides on-device Vision Framework access for PDF and image text extraction. Uses Apple's Vision OCR exclusively -- no cloud services, no API keys, no data leaves your machine.
No automated test available for this server. Check the GitHub README for setup instructions.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
Click any tool to inspect its schema.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in other
Pi Coding Agent extension (CLI-first) — routes bash/read/grep/find/ls through lean-ctx CLI for strong token savings. Optional MCP bridge can register advanced tools.
Autonomous spec-to-product coding-agent CLI with an MCP server exposing 34 tools over stdio.
97% token reduction for AI coding sessions — zero deps, 21 languages, MCP server
App framework, testing framework, and inspector for MCP Apps.
MCP Security Weekly
Get CVE alerts and security updates for Vision.Mcp and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
A standalone MCP server that provides on-device Vision Framework access for PDF and image text extraction. Uses Apple's Vision OCR exclusively -- no cloud services, no API keys, no data leaves your machine.
Built with Swift 6.3, macOS 26, and the MCP Swift SDK.
Two independent parsers, each producing structured PageExtraction results:
RecognizeDocumentsRequest (macOS 26 Vision API) for structured document OCR. Extracts text, tables, lists, and paragraphs.CGImageSource, then runs VNRecognizeTextRequest for text OCR. Supports PNG, JPEG, TIFF, BMP, GIF, HEIC, and WebP.Both paths produce extracted text, confidence scores, and automatic text chunking with configurable overlap. The server is read-only -- it extracts and returns data with no persistence or database.
git clone https://codeberg.org/<your-user>/VisionMCP.git
cd VisionMCP
swift build -c release
The release binary is at .build/release/VisionMCP.
sudo ln -sf $(pwd)/.build/release/VisionMCP /usr/local/bin/visionmcp
Verify:
visionmcp --version
Add to your project's opencode.json:
{
"mcp": {
"visionmcp": {
"type": "local",
"command": ["/usr/local/bin/visionmcp"],
"enabled": true
}
}
}
Or add to your global ~/.config/opencode/opencode.json to make it available across all projects.
ingest_pdfExtracts text from a PDF document using Vision OCR. Returns extracted text, chunks, and metadata.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
file_path | string | yes | Absolute path to the PDF file |
Returns:
raw_text -- full extracted textchunks -- text split into token-limited chunks with overlappages -- per-page extraction with text, confidence, tables, lists, paragraphsfile_hash -- SHA-256 hash of the filepage_count, chunk_count, statusingest_imageExtracts text from an image file using Vision OCR. Returns extracted text and metadata.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
file_path | string | yes | Absolute path to the image file |
Supports: PNG, JPEG, TIFF, BMP, GIF, HEIC, WebP. Max file size: 250 MB.
Returns: Same structure as ingest_pdf.
{
"file_name": "invoice-001.jpeg",
"page_count": 1,
"chunk_count": 2,
"file_hash": "a258e31c...",
"raw_text": "Invoice text here...",
"chunks": "[{\"chunk_index\":0,\"content\":\"...\",\"token_count\":558}]",
"pages": "[{\"page_number\":1,\"text\":\"...\",\"confidence\":0.97}]",
"status": "extracted"
}
VisionMCP
├── PDFParser # Renders pages, runs RecognizeDocumentsRequest
├── PDFDocumentActor # Thread-safe PDFDocument wrapper (Sendable)
├── ImageParser # Loads images, runs VNRecognizeTextRequest
├── TextChunker # Splits text into overlapping token-limited chunks
├── IngestService # Orchestrates parsing + chunking
├── IngestTools # MCP tool definitions + handlers
├── ToolRegistry # Wires MCP server to tools
└── main.swift # Entry point, stdio transport
No shared protocol, no factory, no reconciliation. Each tool routes directly to its parser.
swift build
swift test
Tests use Swift Testing (import Testing, @Test, #expect).
swift run VisionMCP