End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"llm-server-docs": {
"args": [
"vllm"
],
"command": "uvx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
uvx 'vllm' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
PYSEC-2026-145
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition
vLLM Vulnerable to Remote DoS via Special-Token Placeholders
## Summary This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on `image_grid_thw`/`video_grid_thw` are affected. Severity:
vLLM makes Use of Uninitialized Resource
A vulnerability was found in vLLM up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To
PYSEC-2026-144
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of co
vLLM: Server-Side Request Forgery (SSRF) in `download_bytes_from_url `
### Summary A Server Side Request Forgery (SSRF) vulnerability in `download_bytes_from_url` allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions. This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host. ------ ### Details #### Vulnerable component The vulnerable logic is in the batch runner
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add informationBe the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml / education
Dynamic problem-solving through sequential thought chains
Persistent memory using a knowledge graph
Workspace template + MCP server for Claude Code, Codex CLI, Cursor & Windsurf. Multi-agent knowledge engine (ag-refresh / ag-ask) that turns any codebase into a queryable AI assistant.
The official MCP server implementation for the Perplexity API Platform
MCP Security Weekly
Get CVE alerts and security updates for Llm Server Docs and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.