GitHub Advisory — vLLM Completions API RCE (CVE-2025-62164)

AI relevance: vLLM is a common LLM inference server, and this bug sits in its Completions API prompt-embedding path, exposing AI serving endpoints to memory corruption and potential RCE.

Issue: vLLM deserializes user-supplied prompt embeddings with torch.load() without sufficient validation.
Root cause: PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing malicious tensors to bypass bounds checks.
Exploit path: Crafted tensors trigger an out-of-bounds write during to_dense() conversion.
Impact: Memory corruption can crash the server (DoS) and potentially enable remote code execution.
Affected versions: vLLM 0.10.2 up to (but not including) 0.11.1.
Fix: Patched in vLLM 0.11.1 with stronger validation.

Why it matters

Inference servers are often exposed behind APIs for internal apps, copilots, or external customers. A memory corruption path reachable from the Completions API means attackers can pivot from a single request to a service-level outage or code execution on the host running your model.

What to do

Upgrade: Move to vLLM 0.11.1 or later.
Harden endpoints: Gate Completions API access with auth and network controls; avoid exposing unauthenticated endpoints.
Watch for anomalies: Add monitoring for crashes and suspicious embedding payloads.

Read the GitHub advisory

NVD entry

Patch PR