End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
{
"mcpServers": {
"llm-server-docs": {
"args": [
"vllm"
],
"command": "uvx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
Is it safe?
4 open CVEs (38 fixed). Verify on OSV.dev →
No authentication — any process on your machine can connect.
MIT. View license →
Is it maintained?
Last commit 37 days ago. 733 stars.
Will it work with my client?
Transport: stdio, sse, http. Works with Claude Desktop, Cursor, Claude Code, and most MCP clients.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
uvx 'vllm' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
This server is missing a description. Tools and install config are also missing.If you've used it, help the community.
Add information4 open vulnerabilities.
CVE-2026-34755FixedvLLM: Denial of Service via Unbounded Frame Count in video/jpeg Base64 Processing
## Summary The `VideoMediaIO.load_base64()` method at `vllm/multimodal/media/video.py:51-62` splits `video/jpeg` data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The `num_frames` parameter (default: 32), which is enforced by the `load_bytes()` code path at line 47-48, is completely bypassed in the `video/jpeg` base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the serve
CVE-2026-34753FixedvLLM: Server-Side Request Forgery (SSRF) in `download_bytes_from_url `
### Summary A Server Side Request Forgery (SSRF) vulnerability in `download_bytes_from_url` allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions. This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host. ------ ### Details #### Vulnerable component The vulnerable logic is in the batch runner
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Dynamic problem-solving through sequential thought chains
A Model Context Protocol server for searching and analyzing arXiv papers
An open-source AI agent that brings the power of Gemini directly into your terminal.
The official Python SDK for Model Context Protocol servers and clients
MCP Security Weekly
Get CVE alerts and security updates for Llm Server Docs and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
CVE-2026-34756FixedvLLM: Unauthenticated OOM Denial of Service via Unbounded `n` Parameter in OpenAI API Server
### Summary A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the `n` parameter in the `ChatCompletionRequest` and `CompletionRequest` Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large `n` value. This completely blocks the Python `asyncio` event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the
CVE-2026-27893FixedvLLM has Hardcoded Trust Override in Model Files Enables RCE Despite Explicit User Opt-Out
### Summary Two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit `--trust-remote-code=False` security opt-out. This enables remote code execution via malicious model repositories even when the user has explicitly disabled remote code trust. ### Details **Affected files (latest main branch):** 1. `vllm/model_executor/models/nemotron_vl.py:430` ```python vision_model = AutoModel.from_config(config.vision_confi
CVE-2026-25960FixedvLLM has SSRF Protection Bypass
## Summary The SSRF protection fix for https://github.com/vllm-project/vllm/security/advisories/GHSA-qh4c-xf7m-gxfc can be bypassed in the `load_from_url_async` method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client. ## Affected Component - **File**: `vllm/connections.py` - **Function**: `load_from_url_async` ## Vulnerability Details ### Root Cause The SSRF [fix](https://github.com/vllm-project/vllm/pull/32746) uses `urllib3.util.parse_url()
CVE-2026-22778FixedvLLM has RCE In Video Processing
## Summary **A chain of vulnerabilities in vLLM allow Remote Code Execution (RCE):** 1. **Info Leak** - PIL error messages expose memory addresses, bypassing ASLR 2. **Heap Overflow** - JPEG2000 decoder in OpenCV/FFmpeg has a heap overflow that lets us hijack code execution **Result:** Send a malicious video URL to vLLM Completions or Invocations **for a video model** -> Execute arbitrary commands on the server Completely default vLLM instance directly from pip, or docker, does not have auth
CVE-2026-24779FixedvLLM vulnerable to Server-Side Request Forgery (SSRF) through MediaConnector
### Summary A Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods obtain and process media from URLs provided by users, using different Python parsing libraries when restricting the target host. These two parsing libraries have different interpretations of backslashes, which allows the host name restriction to be bypassed. This allows an attacker to coerce the vLL
CVE-2026-22807FixedvLLM affected by RCE via auto_map dynamic module loading during model initialization
# Summary vLLM loads Hugging Face `auto_map` dynamic modules during model resolution **without gating on `trust_remote_code`**, allowing attacker-controlled Python code in a model repo/path to execute at server startup. --- # Impact An attacker who can influence the model repo/path (local directory or remote Hugging Face repo) can achieve **arbitrary code execution** on the vLLM host during model load. This happens **before any request handling** and does **not require API access**. ---
CVE-2026-22773FixedvLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions
### Summary Users can crash the vLLM engine serving multimodal models that use the _Idefics3_ vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination. ### Details The vulnerability is triggered when the image processor encounters a 1x1 pixel image with shape (1, 1, 3) in HWC (Height, Width, Channel) format. Due to the ambiguous dimensions, the processor
GHSA-mcmc-2m55-j8jjFixedvLLM introduced enhanced protection for CVE-2025-62164
### Summary The fix [here](https://github.com/vllm-project/vllm/pull/27204) for CVE-2025-62164 is not sufficient. The fix only disables prompt embeds by default rather than addressing the root cause, so the DoS vulnerability remains when the feature is enabled. ### Details vLLM's pending change attempts to fix the root cause, which is the missing sparse tensor validation. PyTorch (~v2.0) disables sparse tensor validation (specifically, sparse tensor invariants checks) by default for performanc
CVE-2025-66448FixedvLLM vulnerable to remote code execution via transformers_utils/get_config
### Summary `vllm` has a critical remote code execution vector in a config class named `Nemotron_Nano_VL_Config`. When `vllm` loads a model config that contains an `auto_map` entry, the config class resolves that mapping with `get_class_from_dynamic_module(...)` and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in the `auto_map` string. Crucially, this happens even when the caller explicitly sets `trust_remote_code=False` in
CVE-2025-62426FixedvLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`
### Summary The /v1/chat/completions and /tokenize endpoints allow a `chat_template_kwargs` request parameter that is used in the code before it is properly validated against the chat template. With the right `chat_template_kwargs` parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests ### Details In serving_engine.py, the chat_template_kwargs are unpacked into kwargs passed to chat_utils.py `apply_hf_chat_template` with no valida
CVE-2025-62372FixedvLLM vulnerable to DoS with incorrect shape of multimodal embedding inputs
### Summary Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct `ndim` but incorrect `shape` (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page). The issue has existed ever since we added support for image embedding inputs, i.e. #6613 (released in v0.5.5) ### Details Using image embeddings as an example: - For models that support image embeddin
CVE-2025-62164FixedvLLM deserialization vulnerability leading to DoS and potential RCE
### Summary A memory corruption vulnerability that leading to a crash (denial-of-service) and potentially remote code execution (RCE) exists in vLLM versions 0.10.2 and later, in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass
CVE-2025-6242FixedvLLM is vulnerable to Server-Side Request Forgery (SSRF) through `MediaConnector` class
### Summary A Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The `load_from_url` and `load_from_url_async` methods fetch and process media from user-provided URLs without adequate restrictions on the target hosts. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources. This vulnerability is particularly critical in containerized environments like `llm
CVE-2025-61620FixedvLLM: Resource-Exhaustion (DoS) through Malicious Jinja Template in OpenAI-Compatible Server
### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these parameters to the API, they can cause a service outage by exhausting CPU and/or memory resources. ### Details When using an LLM as a chat model, the conversation history must be rendered into a text input for the model. In `hf/
CVE-2025-59425FixedvLLM is vulnerable to timing attack at bearer auth
### Summary The API key support in vLLM performed validation using a method that was vulnerable to a timing attack. This could potentially allow an attacker to discover a valid API key using an approach more efficient than brute force. ### Details https://github.com/vllm-project/vllm/blob/4b946d693e0af15740e9ca9c0e059d5f333b1083/vllm/entrypoints/openai/api_server.py#L1270-L1274 API key validation used a string comparison that will take longer the more characters the provided API key gets corre
CVE-2025-9141FixedvLLM has remote code execution vulnerability in the tool call parser for Qwen3-Coder
### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py) contains a code execution path that uses Python's `eval()` function to parse tool call parameters. This occurs during the parameter conver
CVE-2025-48956Fixedvllm API endpoints vulnerable to Denial of Service Attacks
### Summary A Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. ### Details The vulnerability leverages the abuse of HTTP headers. By setting a header such as `X-Forwarded-For` to a very large value like `("A" * 5_800_000_000
CVE-2025-48942FixedPYSEC-2025-54
vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-48943, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue.
CVE-2025-48943FixedPYSEC-2025-55
vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg/CVE-2025-48942, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue.
CVE-2025-48887FixedPYSEC-2025-50
vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable. The pattern contains multiple n
CVE-2025-46570FixedPYSEC-2025-53
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
CVE-2025-46722FixedPYSEC-2025-43
vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializes PIL.Image.Image objects using only obj.tobytes(), which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30
CVE-2025-48944FixedvLLM Tool Schema allows DoS via Malformed pattern and type Fields
### Summary The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. ### Details The "type" field is expected to be one of: "string", "number", "object", "boolean", "array", or "null".
GHSA-j828-28rj-hfhpFixedvLLM vulnerable to Regular Expression Denial of Service
### Summary A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking. #### 1. vllm/lora/utils.py [Line 173](https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173) https://github.com/vllm-project/vllm/blob/2858830c39da0
CVE-2025-47277FixedvLLM Allows Remote Code Execution via PyNcclPipe Communication Service
### Impacted Environments This issue ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. ### Summary vLLM supports the use of the `PyNcclPipe` class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the `PyNcclCommunicator` class, while CPU-side control message passing is handled via the `send_obj` and `re
CVE-2025-30165FixedRemote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration
### Affected Environments Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, we have decided not to fix this issue. Instead we recommend that users ensure their environment is on a secure network in case this pattern is i
CVE-2025-32444FixedPYSEC-2025-42
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make
CVE-2025-46560Fixedphi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service
### Summary A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_*|>, <|image_*|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs. ### Details Affected Component
CVE-2025-30202FixedData exposure via ZeroMQ on multi-node vLLM deployment
### Impact In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an `XPUB` ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts. Any client with network access to this host can connect to this `XPUB` socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all o
GHSA-ggpf-24jw-3fcwFixedCVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0
## Description https://github.com/vllm-project/vllm/security/advisories/GHSA-rh4j-5rhw-hr54 reported a vulnerability where loading a malicious model could result in code execution on the vllm host. The fix applied to specify `weights_only=True` to calls to `torch.load()` did not solve the problem prior to PyTorch 2.6.0. PyTorch has issued a new CVE about this problem: https://github.com/advisories/GHSA-53q9-r3pm-6pq6 This means that versions of vLLM using PyTorch before 2.6.0 are vulnerable t
GHSA-hf3c-wxg2-49q9FixedvLLM vulnerable to Denial of Service by abusing xgrammar cache
### Impact This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3 The [xgrammar](https://xgrammar.mlc.ai/docs/) library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default thr
CVE-2024-9053OpenvLLM allows Remote Code Execution by Pickle Deserialization via AsyncEngineRPCServer() RPC server entrypoints
vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization. This can result in remote code execution by deserializing malicious pickle data.
CVE-2024-9052OpenvLLM deserialization vulnerability in vllm.distributed.GroupCoordinator.recv_object
vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability. ### Maintainer perspective Note that vLLM does NOT use the code as described in the report on huntr. The problem only exists if you use these internal APIs in a way that exposes them to a network as described. The vllm t
CVE-2024-11041OpenvLLM Deserialization of Untrusted Data vulnerability
vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload to the MessageQueue, causing the victim's machine to execute arbitrary code.
CVE-2025-29783FixedPYSEC-2025-63
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0.
CVE-2025-29770FixedvLLM denial of service via outlines unbounded cache on disk
### Impact The [outlines](https://dottxt-ai.github.io/outlines/latest/) library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server. The affected code in vLLM is [vllm/model_executor/guided_decoding/outlines_logits_processors.py](https://github.com/vl
CVE-2025-25183FixedPYSEC-2025-62
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of
CVE-2025-24357FixedPYSEC-2025-58
vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When torch.load loads malicious pickle data, it will execute arbitrary code during unpickling. This vulnerability is fixed in v0.7.0.
CVE-2024-8768FixedvLLM denial of service vulnerability
A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.
CVE-2024-8939OpenvLLM Denial of Service via the best_of parameter
A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best completion from several options. When this parameter is set to a large value, the API does not handle timeouts or resource exhaustion properly, allowing an attacker to cause a DoS by consuming excessive system resources. Thi