• Issue: vLLM deserializes user-supplied prompt embeddings with torch.load() without sufficient validation.
  • Root cause: PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing malicious tensors to bypass bounds checks.
  • Exploit path: Crafted tensors trigger an out-of-bounds write during to_dense() conversion.
  • Impact: Memory corruption can crash the server (DoS) and potentially enable remote code execution.
  • Affected versions: vLLM 0.10.2 up to (but not including) 0.11.1.
  • Fix: Patched in vLLM 0.11.1 with stronger validation.

Why it matters

Inference servers are often exposed behind APIs for internal apps, copilots, or external customers. A memory corruption path reachable from the Completions API means attackers can pivot from a single request to a service-level outage or code execution on the host running your model.

What to do

  • Upgrade: Move to vLLM 0.11.1 or later.
  • Harden endpoints: Gate Completions API access with auth and network controls; avoid exposing unauthenticated endpoints.
  • Watch for anomalies: Add monitoring for crashes and suspicious embedding payloads.

Read the GitHub advisory

NVD entry

Patch PR