Compress prompts 40-60% using local LLM + embedding validation. Preserves all conditionals.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"token-compressor": {
"cwd": "/path/to/token-compressor",
"args": [
"-m",
"token_compressor_mcp"
],
"command": "python3"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
mcp-name: io.github.base76-research-lab/token-compressor
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
uvx 'ollama' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
PYSEC-2026-102
An issue in ollama v.0.12.10 allows a remote attacker to cause a denial of service via the fs/ggml/gguf.go, function readGGUFV1String reads a string length from untrusted GGUF metadata
>= 0source →PYSEC-2026-101
An issue in ollama v.0.12.10 allows a remote attacker to cause a denial of service via the GGUF decoder
>= 0source →PYSEC-2025-146
An issue in Ollama v0.1.33 allows attackers to delete arbitrary files via sending a crafted packet to the endpoint /api/pull.
>= 0source →PYSEC-2025-147
Cross-Domain Token Exposure in server.auth.getAuthorizationToken in Ollama 0.6.7 allows remote attackers to steal authentication tokens and bypass access controls via a malicious realm value in a WWW-Authenticate header returned by the /api/pull endpoint.
>= 0source →PYSEC-2025-145
A vulnerability in the Ollama server version 0.5.11 allows a malicious user to cause a Denial of Service (DoS) attack by customizing the manifest content and spoofing a service. This is due to improper validation of array index access when downloading a model via the /api/pull endpoint, which can lead to a server crash.
>= 0source →Click any tool to inspect its schema.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
An autonomous agent that conducts deep research on any data using any LLM providers
🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coordinate autonomous workflows, and build conversational AI systems. Features enterprise-grade architecture, distributed swarm intelligence, RAG integration, and native Claude Code / Codex Integration
MCP Security Weekly
Get CVE alerts and security updates for io.github.base76-research-lab/token-compressor and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
mcp-name: io.github.base76-research-lab/token-compressor
Semantic prompt compression for LLM workflows. Reduce token usage by 40–60% without losing meaning.
Built by Base76 Research Lab — research into epistemic AI architecture.
Intent Compiler MVP is now live and uses this project as part of the idea -> spec -> compressed output flow:
token-compressor is a two-stage pipeline that compresses prompts before they reach an LLM:
The result: shorter prompts, lower costs, same intent.
Input prompt (300 tokens)
↓
LLM compresses
↓
Embedding validates (cosine ≥ 0.85?)
↓
Pass → compressed (120 tokens) Fail → original (300 tokens)
Key design principle: conditionality is never sacrificed. If your prompt says "only do X if Y", that constraint survives compression.
ollama pull llama3.2:1b
ollama pull nomic-embed-text
pip install ollama numpy
from compressor import LLMCompressEmbedValidate
pipeline = LLMCompressEmbedValidate()
result = pipeline.process("Your prompt text here...")
print(result.output_text) # compressed (or original if validation failed)
print(result.report()) # MODE / COVERAGE / TOKENS saved
Result object:
| Field | Description |
|---|---|
output_text | Text to send to your LLM |
mode | compressed / raw_fallback / skipped |
coverage | Cosine similarity (0.0–1.0) |
tokens_in | Estimated input tokens |
tokens_out | Estimated output tokens |
tokens_saved | Difference |
echo "Your long prompt here..." | python3 cli.py
Output: compressed text on stdout, stats on stderr.
Add to your ~/.claude/settings.json under hooks → UserPromptSubmit:
{
"type": "command",
"command": "echo \"${CLAUDE_USER_PROMPT:-}\" | python3 /path/to/token-compressor/cli.py > /tmp/compressed_prompt.txt 2>/tmp/compress.log || true"
}
This runs on every prompt submission and writes the compressed version to a temp file, which can be injected back into context via a second hook or MCP server.
The MCP server exposes compression as a tool callable from Claude Code and any MCP-compatible client.
Install:
pip install token-compressor-mcp
Tool: compress_prompt
text (string)Claude Code MCP config (~/.claude/settings.json):
{
"mcpServers": {
"token-compressor": {
"command": "uvx",
"args": ["token-compressor-mcp"]
}
}
}
Or from source:
{
"mcpServers": {
"token-compressor": {
"command": "python3",
"args": ["-m", "token_compressor_mcp"],
"cwd": "/path/to/token-compressor"
}
}
}
pipeline = LLMCompressEmbedValidate(
threshold=0.85, # cosine similarity floor (lower = more aggressive)
... [View full README on GitHub](https://github.com/base76-research-lab/token-compressor#readme)