GitHub Advisory — vLLM Completions API RCE (CVE-2025-62164)
AI relevance: vLLM is a common LLM inference server, and this bug sits in its Completions API prompt-embedding path, exposing AI serving endpoints to memory corruption and potential RCE.
- Issue: vLLM deserializes user-supplied prompt embeddings with
torch.load()without sufficient validation. - Root cause: PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing malicious tensors to bypass bounds checks.
- Exploit path: Crafted tensors trigger an out-of-bounds write during
to_dense()conversion. - Impact: Memory corruption can crash the server (DoS) and potentially enable remote code execution.
- Affected versions: vLLM 0.10.2 up to (but not including) 0.11.1.
- Fix: Patched in vLLM 0.11.1 with stronger validation.
Why it matters
Inference servers are often exposed behind APIs for internal apps, copilots, or external customers. A memory corruption path reachable from the Completions API means attackers can pivot from a single request to a service-level outage or code execution on the host running your model.
What to do
- Upgrade: Move to vLLM 0.11.1 or later.
- Harden endpoints: Gate Completions API access with auth and network controls; avoid exposing unauthenticated endpoints.
- Watch for anomalies: Add monitoring for crashes and suspicious embedding payloads.