Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"parse": {
"args": [
"/absolute/path/to/parse-mcp/server.py"
],
"command": "python3"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
One MCP, many parsers. Default markitdown (free, fast, MIT). Escalate to Docling (table-heavy, scanned PDFs) or LlamaParse (cloud, BYOK) when markitdown's quality isn't enough. Plus an interpret tool that pipes parsed markdown into Claude for "summarize / extract X" so you stop juggling parsers and anthropic skills.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
uvx 'docling' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
Docling: Unsafe URI and Path Handling in HTML Backend
### Impact The HTML backend did not perform sufficient validation during resource handling: - Accepted `file://` URIs enabling local file system access when `enable_local_fetch=True` - Path resolution allowed traversal outside intended directories via `../` sequences and absolute paths - Did not block internal network resources under `enable_remote_fetch=True` - HTTP redirects were not validated, potentially redirecting to unintended schemes - No resource limits for remote image downloads and `d
Docling: Potential Path Traversal via LaTeX \includegraphics and \input Commands
### Impact The LaTeX backend's handling of `\includegraphics`, `\input`, and `\include` commands lacked path containment validation. Attackers could craft malicious LaTeX documents with path traversal sequences (e.g., `../../../etc/passwd`) to: - Read arbitrary files from the file system accessible to the process - Include sensitive files in the converted document output - Potentially access configuration files, credentials, or other sensitive data ### Patches Fixed in version 2.91.0. The fix i
Docling: Unsafe XML Entity Expansion in USPTO Patent Backend
### Impact The USPTO patent XML parser used the standard `xml.sax.parseString()` without protection against XML External Entity (XXE) attacks. An attacker could craft malicious USPTO patent XML files with external entity references that could: - Read arbitrary files from the server filesystem - Perform Server-Side Request Forgery (SSRF) attacks - Cause denial of service through entity expansion (Billion Laughs attack) The vulnerability affects three USPTO patent format parsers: ICE (v4.x), Gran
Docling: Unsafe Archive Extraction and XML Parsing in METS-GBS Backend
### Impact The METS-GBS backend's XML parsing and the input document format detection lacked security controls, enabling: - XML External Entity (XXE) attacks to read local files or cause denial of service - Decompression bombs (zip bombs) to exhaust memory and disk space - Unbounded archive extraction consuming system resources An attacker could craft malicious METS-GBS archives that, when processed, could read sensitive files, exhaust system resources, or cause application crashes. ### Patche
Docling: Unsafe Playwright-based HTML Rendering
### Impact In versions `>= 2.82.0, < 2.91.0`, if the HTML backend was explicitly configured for rendering (rendering option by default deactivated), then the Playwright-based rendering feature could allow JavaScript execution and unrestricted network access when processing untrusted HTML documents. An attacker could craft malicious HTML that executes arbitrary JavaScript in the rendering context or makes unauthorized network requests to internal services, potentially leading to SSRF attacks, dat
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
Privacy-first. MCP is the protocol for tool access. We're the virtualization layer for context.
🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, distributed swarm intelligence, RAG integration, and native Claude Code / Codex Integration
MCP Security Weekly
Get CVE alerts and security updates for io.github.adelaidasofia/parse-mcp and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
One MCP, many parsers. Default markitdown (free, fast, MIT). Escalate to Docling (table-heavy, scanned PDFs) or LlamaParse (cloud, BYOK) when markitdown's quality isn't enough. Plus an interpret tool that pipes parsed markdown into Claude for "summarize / extract X" so you stop juggling parsers and anthropic skills.
Open Claude Code, paste:
/plugin marketplace add adelaidasofia/parse-mcp
/plugin install parse-mcp@parse-mcp
Manual install (pre-plugin-marketplace). See SETUP.md for full details.
pip3 install --break-system-packages -r requirements.txt
pip3 install --break-system-packages 'markitdown[pdf,docx,pptx,xlsx]'
Then register the server in your client's .mcp.json:
{
"mcpServers": {
"parse": {
"command": "python3",
"args": ["/absolute/path/to/parse-mcp/server.py"]
}
}
}
| Tool | What it does |
|---|---|
parse(source, backend?, hints?) | File path or http(s) URL to markdown. Router picks backend, falls back on empty/error. Returns markdown plus a chain of every backend attempted. |
parse_url(url, backend?) | Shortcut for HTTP(S) inputs. Same return shape as parse. |
parse_to_vault(source, vault_folder?, backend?, overwrite?) | Parse + write the result as a markdown note in the vault. Default folder: <VAULT_ROOT>/📥 Inbox/Converted/. Frontmatter records source, format, backend, latency, bytes_in. Replaces the standalone markitdown_to_vault.py shell script. |
interpret(source, instruction, backend?, model?, max_tokens?) | Parse first, then ask Claude over the parsed markdown. Cache hits reuse parsed text for free input tokens. |
list_backends() | Which backends are installed + which are missing. Diagnostic. |
benchmark(source) | Run every available backend on the same input. Compare latency + output side by side. |
chunk_text(text, doc_type?, target_tokens?, max_tokens?, min_tokens?) | Chunk parsed markdown into retrieval-ready pieces using a doc-type-aware chunker. doc_type="auto" (default) runs structural detection and picks one of paper / book / manual / qa / resume / table / default. Each chunker honors document shape (e.g., paper keeps the abstract whole; manual never merges across numbered sections; qa pairs each question with its answer). Returns chunks + the resolved doc_type. See chunkers/ package. |
detect_doc_type(text) | Diagnostic. Run structural heuristics over markdown and return the doc_type that chunk_text would pick. |