Deterministic DOCX/PPTX/XLSX/PDF parser: track changes, comments, headers, footers, merged cells.
{
"mcpServers": {
"io-github-sunholo-data-parse": {
"command": "<see-readme>",
"args": []
}
}
}No install config available. Check the server's README for setup instructions.
Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Deterministic DOCX/PPTX/XLSX/PDF parser: track changes, comments, headers, footers, merged cells.
Is it safe?
No package registry to scan.
No authentication — any process on your machine can connect.
License not specified.
Is it maintained?
Commit history unknown.
Will it work with my client?
Transport: . Compatibility not confirmed.
No automated test available for this server. Check the GitHub README for setup instructions.
No known vulnerabilities.
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationHave you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Temporal memory for AI with decay and reinforcement. Two-layer storage (JSONL + Markdown).
Hierarchical markdown memory palace for AI agents — structured palace navigation via MCP tools.
Web scraping for AI agents. Converts URLs to clean, LLM-ready Markdown with anti-bot bypass.
文颜 MCP Server 可以让 AI 自动将 Markdown 文章排版后发布至微信公众号。
MCP Security Weekly
Get CVE alerts and security updates for io.github.sunholo-data/parse and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Universal document parsing in AILANG. Extracts structured content from DOCX, PPTX, XLSX, PDF, and image files into JSON and markdown.
Office formats (DOCX, PPTX, XLSX) use deterministic XML parsing — no AI, no cloud, instant results. PDFs and images delegate to whatever AI model you plug in (Gemini, Claude, local Ollama). AILANG Parse is AI-agnostic: swap --ai to change the backend, zero code changes.
Requires AILANG CLI.
# Clone and symlink
git clone https://github.com/sunholo-data/ailang-parse.git
ln -s "$(pwd)/ailang-parse/bin/docparse" /usr/local/bin/docparse
Use AILANG Parse from your language of choice:
pip install ailang-parse # Python
npm install @ailang/parse # JavaScript/TypeScript
go get github.com/sunholo-data/ailang-parse-go # Go
# Office documents (deterministic, no AI needed)
docparse report.docx
docparse slides.pptx
docparse spreadsheet.xlsx
# PDF and images (AI auto-enabled)
docparse document.pdf
docparse photo.png
# Options
docparse report.docx describe # AI image descriptions
docparse report.docx summarize # AI document summary
docparse scan.pdf --ai gemini-3-flash-preview # Choose AI backend
# Format conversion
docparse report.docx --convert output.html
docparse data.csv --convert report.docx
docparse notes.md --convert slides.pptx
# AI document generation
ailang run --entry main --caps IO,FS,Env,AI --ai gemini-2.5-flash \
docparse/main.ail --generate report.docx --prompt "Q1 sales report with tables"
Every run produces:
docparse/data/output.json — Structured JSON with typed blocksdocparse/data/output.md — LLM-ready markdown| Feature | DOCX | PPTX | XLSX | Best Competitor | |---------|------|------|------|-----------------| | Tables with merged cells | Yes | Yes | Yes | Raw OOXML only | | Track changes (redlining) | Yes | — | — | Pandoc (3/3) | | Comments (interleaved) | Yes | — | — | Raw OOXML (2/2) | | Headers/footers | Yes | — | — | Kreuzberg (2/3) | | Text boxes / VML shapes | Yes | Yes | — | Raw OOXML (1/2) | | Equations (§22.1) | Yes | — | — | None | | Field codes (§17.16) | Yes | — | — | Kreuzberg, OOXML | | Speaker notes | — | Yes | — | None | | Multi-sheet extraction | — | — | Yes | Kreuzberg |
OfficeDocBench (69 files, 11 formats, 7 metrics): AILANG Parse 93.9% composite with 100% coverage vs nearest competitor 68.0% coverage-adjusted. 8 parsers compared including Raw OOXML, Pandoc, Kreuzberg, MarkItDown, Unstructured, Docling. Scores include aspirational ECMA-376 spec targets that intentionally lower our score.
Parsing (13 formats): DOCX, PPTX, XLSX, OD