Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"io-github-arknill-docpick": {
"args": [
"docpick"
],
"command": "uvx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Document in, Structured JSON out. Locally. With your schema.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
uvx 'docpick' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
Checked docpick against OSV.dev.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in productivity / developer-tools
XcodeBuildMCP provides tools for Xcode project management, simulator management, and app utilities.
Context7 Platform -- Up-to-date code documentation for LLMs and AI code editors
Copy/paste detector for programming source code, supports 223 formats. AI-ready with token-efficient reporter, skill and MCP server.
A Model Context Protocol (MCP) server and CLI that provides tools for agent use when working on iOS and macOS projects.
MCP Security Weekly
Get CVE alerts and security updates for io.github.ArkNill/docpick and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Document in, Structured JSON out. Locally. With your schema.
docpick is a lightweight, schema-driven document extraction pipeline that combines local OCR engines with local LLMs to extract structured JSON from any document — invoices, receipts, bills of lading, tax forms, and more.
pip install docpick # core (LLM extraction only)
pip install docpick[paddle] # + PaddleOCR (recommended)
pip install docpick[easyocr] # + EasyOCR (Korean-optimized)
pip install docpick[got] # + GOT-OCR2.0 (GPU, vision-language)
pip install docpick[all] # all OCR backends
Requirements: Python 3.11+ / LLM endpoint (vLLM, Ollama, or OpenAI-compatible)
from docpick import DocpickPipeline
from docpick.schemas import InvoiceSchema
pipeline = DocpickPipeline()
result = pipeline.extract("invoice.pdf", schema=InvoiceSchema)
print(result.data) # Structured dict matching schema
print(result.validation) # Validation errors/warnings
print(result.confidence) # Per-field confidence scores
# Extract structured data
docpick extract invoice.pdf --schema invoice --output result.json
# OCR only (no LLM)
docpick ocr document.png --lang ko,en
# Validate extracted JSON
docpick validate result.json --schema invoice
# Batch process a directory
docpick batch ./documents/ --schema invoice --output ./results/ --concurrency 4
# List available schemas
docpick schemas list
# Show schema details
docpick schemas show invoice
| Schema | Document Type | Key Validations |
|---|---|---|
invoice | Commercial invoices | Line item sums, tax ID checkdigit, date order |
receipt | Retail/restaurant receipts | Total = subtotal + tax + tip |
bill_of_lading | Ocean/air B/L | Container weight sums, ISO 6346, HS code format |
purchase_order | Purchase orders | PO total = line items, delivery date order |
kr_tax_invoice | Korean e-tax invoice (세금계산서) | Business number checkdigit (x2), supply/tax/total sums |
bank_statement | Bank statements | IBAN mod97, period date order |
id_document | Passport/ID (ICAO 9303) | MRZ, ISO 3166 country codes, date ranges |
certificate_of_origin | Certificate of Origin | ISO 3166 alpha-2 country codes |
Define your own schema with Pydantic:
from pydantic import BaseModel
from docpick import DocpickPipeline
from docpick.validation.rules import SumEqualsRule, RequiredFieldRule
class MyDocument(BaseModel):
"""Custom document schema."""
company_name: str | None = None
total_amount: float | None = None
tax_amount: float | None = None
net_amount: float | None = None
items: list[dict] | None = None
class ValidationRules:
rules = [
RequiredFieldRule("company_name"),
SumEqualsRule(["net_amount", "tax_amount"], "total_amount"),
]
pipeline = DocpickPipeline()
result = pipeline.extract("my_document.pdf", schema=MyDocument)
Or use a JSON Schema file:
docpick extract document.pdf --schema my_schema.json
| Algorithm | Use Case |
|---|