A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction, file listing, and metadata retrieval via MCP-compliant tools and REST APIs. Built with Spring Boot, Jetty, and MCP SDK.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"mcp-pdf-extractor-server": {
"command": "<see-readme>",
"args": []
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
The Tika MCP Extractor Server is a Model Context Protocol (MCP) compliant server that uses Apache Tika to extract content and metadata from files in various formats (e.g., PDF, DOCX, TXT, HTML, images) stored in a files-to-extract directory. It supports conversion to HTML (with optional CSS styling for better readability) or plain text and provides tools to list files and retrieve metadata. Built with Java 23, Spring Boot, Jetty, and the MCP SDK (0.11.0), it integrates with MCP-compliant clients
No automated test available for this server. Check the GitHub README for setup instructions.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
Click any tool to inspect its schema.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in productivity
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local
Local-first AI memory with knowledge graphs and hybrid search. 17+ AI tools via MCP. Free.
MCP Security Weekly
Get CVE alerts and security updates for MCP PDF Extractor Server and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
The Tika MCP Extractor Server is a Model Context Protocol (MCP) compliant server that uses Apache Tika to extract content and metadata from files in various formats (e.g., PDF, DOCX, TXT, HTML, images) stored in a files-to-extract directory. It supports conversion to HTML (with optional CSS styling for better readability) or plain text and provides tools to list files and retrieve metadata. Built with Java 23, Spring Boot, Jetty, and the MCP SDK (0.11.0), it integrates with MCP-compliant clients like Claude Desktop or MCP Inspector.
The server exposes four MCP tools:
extract-to-html: Converts file content to HTML (with embedded CSS).extract-text: Extracts plain text.list-available-files: Lists files in the directory with details.get-file-metadata: Retrieves detailed file metadata.It also provides REST endpoints for testing, including a new endpoint to serve raw HTML directly for browser rendering. All operations are local, requiring no internet access, making it ideal for secure document processing workflows.
files-to-extract for files, providing size, MIME type, and modification details./api/test/list: Lists available files./api/test/extract-html: Extracts file content as JSON with HTML string./api/test/extract-text: Extracts file content as plain text in JSON./api/test/raw-html: Serves raw HTML directly (renderable in browsers)./api/health: Checks server and directory status.application.properties.files-to-extract directory; no internet required.Clone the Repository (if hosted):
git clone https://github.com/RayenMalouche/MCP-PDF-Extractor-server.git
cd MCP-PDF-Extractor-server
Create the Files Directory:
files-to-extract (configurable).mkdir files-to-extract
sample.pdf, document.docx) for testing.mvn clean install
target/.Settings are defined in src/main/resources/application.properties:
# Tika MCP Extractor Server Configuration
spring.application.name=TikaExtractorMCPServer
# Server Configuration
server.port=45453
# Tika Configuration
tika.max.string.length=-1
tika.detect.language=false
# File Processing Configuration
files.directory=files-to-extract
files.max.size=52428800
# Logging Configuration
logging.level.org.apache.tika=DEBUG
logging.level.org.apache.pdfbox=DEBUG