MCP server that saves Claude Code tokens by delegating bounded tasks to local or cloud LLMs. 93% token savings benchmarked. Works with LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras.
{
"mcpServers": {
"houtini-lm": {
"env": {
"LM_STUDIO_URL": "http://localhost:1234"
},
"args": [
"-y",
"@houtini/lm"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
> How it works | Token savings | Quick start | What gets offloaded | Tools | Model routing | Configuration | Compatible endpoints
Is it safe?
No known CVEs for @houtini/lm.
No authentication — any process on your machine can connect.
MIT. View license →
Is it maintained?
Last commit 9 days ago. 56 stars. 65 weekly downloads.
Will it work with my client?
Transport: stdio, sse, http. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
Quick Navigation
How it works | Token savings | Quick start | What gets offloaded | Tools | Model routing | Configuration | Compatible endpoints
I built this because I kept leaving Claude Code running overnight on big refactors and the token bill was painful. A huge chunk of that spend goes on bounded tasks any decent model handles fine - generating boilerplate, code review, commit messages, format conversion. Stuff that doesn't need Claude's reasoning or tool access.
Houtini LM connects Claude Code to a local LLM on your network - or any OpenAI-compatible API. Claude keeps doing the hard work - architecture, planning, multi-file changes - and offloads the grunt work to whatever cheaper model you've got running. Free. No rate limits. Private.
I wrote a full walkthrough of why I built this and how I use it day to day.
Claude Code (orchestrator)
|
|-- Complex reasoning, planning, architecture --> Claude API (your tokens)
|
+-- Bounded grunt work --> houtini-lm --HTTP/SSE--> Your local LLM (free)
. Boilerplate & test stubs Qwen, Llama, Nemotron, GLM...
. Code review & explanations LM Studio, Ollama, vLLM, llama.cpp
. Commit messages & docs DeepSeek, Groq, Cerebras (cloud)
. Format conversion
. Mock data & type definitions
. Embeddings for RAG pipelines
Claude's the architect. Your local model's the drafter. Claude QAs everything.
We built a benchmark using real source files (581–2022 lines of TypeScript) across realistic delegation patterns. The savings come from context avoidance — when Claude delegates, it never reads the source file into its context window.
| Task | Claude direct | Delegated | Saved |
|---|---|---|---|
| Code review (1352 lines) | 14,466 tok | 769 tok | 95% |
| Architecture review (2022 lines) | 20,014 tok | 983 tok | 95% |
| External repo review (581 lines) | 5,344 tok | 741 tok | 86% |
| Code explanation (833 lines) | 8,678 tok | 744 tok | 91% |
93.3% net token savings across the session. Without delegation, Claude reads 14,000 tokens of source code then generates a 500-token review. With delegation, Claude sends a ~250 token tool call and reads back a ~500 token summary. The source file never enters Claude's context.
Small tasks (quick answers, commit messages) don't save tokens — the ~250 token MCP overhead dominates. But for anything involving reading and analysing files, which is the majority of real coding sessions, delegation pays for itself immediately.
Run the benchmark against your own setup: `LM_STUDIO_URL=http://your-server:1234 node benc
This server supports HTTP transport. Be the first to test it — help the community know if it works.
No known vulnerabilities.
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Dynamic problem-solving through sequential thought chains
A Model Context Protocol (MCP) server and CLI that provides tools for agent use when working on iOS and macOS projects.
The official Python SDK for Model Context Protocol servers and clients
An open-source AI agent that brings the power of Gemini directly into your terminal.
MCP Security Weekly
Get CVE alerts and security updates for Houtini Lm and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.