Compress prompts 40-60% using local LLM + embedding validation. Preserves all conditionals.
{
"mcpServers": {
"io-github-base76-research-lab-token-compressor": {
"command": "<see-readme>",
"args": []
}
}
}No install config available. Check the server's README for setup instructions.
Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Compress prompts 40-60% using local LLM + embedding validation. Preserves all conditionals.
Is it safe?
No package registry to scan.
No authentication — any process on your machine can connect.
License not specified.
Is it maintained?
Last commit 31 days ago. 9 stars.
Will it work with my client?
Transport: stdio. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
No automated test available for this server. Check the GitHub README for setup instructions.
No known vulnerabilities.
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationHave you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Dynamic problem-solving through sequential thought chains
A Model Context Protocol server for searching and analyzing arXiv papers
An open-source AI agent that brings the power of Gemini directly into your terminal.
The official Python SDK for Model Context Protocol servers and clients
MCP Security Weekly
Get CVE alerts and security updates for io.github.base76-research-lab/token-compressor and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
mcp-name: io.github.base76-research-lab/token-compressor
Semantic prompt compression for LLM workflows. Reduce token usage by 40–60% without losing meaning.
Built by Base76 Research Lab — research into epistemic AI architecture.
Intent Compiler MVP is now live and uses this project as part of the idea -> spec -> compressed output flow:
token-compressor is a two-stage pipeline that compresses prompts before they reach an LLM:
The result: shorter prompts, lower costs, same intent.
Input prompt (300 tokens)
↓
LLM compresses
↓
Embedding validates (cosine ≥ 0.85?)
↓
Pass → compressed (120 tokens) Fail → original (300 tokens)
Key design principle: conditionality is never sacrificed. If your prompt says "only do X if Y", that constraint survives compression.
ollama pull llama3.2:1b
ollama pull nomic-embed-text
pip install ollama numpy
from compressor import LLMCompressEmbedValidate
pipeline = LLMCompressEmbedValidate()
result = pipeline.process("Your prompt text here...")
print(result.output_text) # compressed (or original if validation failed)
print(result.report()) # MODE / COVERAGE / TOKENS saved
Result object:
| Field | Description |
|-------|-------------|
| output_text | Text to send to your LLM |
| mode | compressed / raw_fallback / skipped |
| coverage | Cosine similarity (0.0–1.0) |
| tokens_in | Estimated input tokens |
| tokens_out | Estimated output tokens |
| tokens_saved | Difference |
echo "Your long prompt here..." | python3 cli.py
Output: compressed text on stdout, stats on stderr.
Add to your ~/.claude/settings.json under hooks → UserPromptSubmit:
{
"type": "command",
"command": "echo \"${CLAUDE_USER_PROMPT:-}\" | python3 /path/to/token-compressor/cli.py > /tmp/compressed_prompt.txt 2>/tmp/compress.log || true"
}
This runs on every prompt submission and writes the compressed version to a temp file, which can be injected back into context via a second hook or MCP server.
The MCP server exposes compression as a tool callable from Claude Code and any MCP-compatible client.
Install:
pip install token-compressor-mcp
Tool: compress_prompt
text (string)Claude Code MCP config (~/.claude/settings.json):
{
"mcpServers": {
"token-compressor": {
"command": "uvx",
"args": ["token-compressor-mcp"]
}
}
}
Or from source:
{
"mcpServers": {
"token-compressor": {
"command": "python3",
"args": ["-m", "token_compressor_mcp"],
"cwd": "/path/to/token-compressor"
}
}
}
pipeline = LLMCompressEmbedValidate(
threshold=0.85, # cosine similarity floor (lower = more aggressive)
... [View full README on GitHub](https://github.com/base76-research-lab/token-compressor#readme)