vLLM — Two New CVEs Expose AI Inference Servers to Code Injection and NaN Exploits

AI relevance: vLLM powers production inference for thousands of deployments; unauthenticated RCE or GPU parameter manipulation directly compromises model serving integrity and availability.

  • CVE-2026-41523 (High): An assert-based security check in vLLM's activation function loading can be bypassed when Python runs with optimizations enabled (python -O or PYTHONOPTIMIZE=1), allowing unauthenticated remote code execution on the inference server.
  • CVE-2026-54235 (Medium): Temperature parameter validation uses comparison operators that silently evaluate to False for NaN and positive Infinity float values, allowing these to bypass guards and propagate to GPU kernels.
  • Both vulnerabilities were published June 22, 2026, affecting vLLM versions prior to 0.23.1rc0.
  • CVE-2026-41523 is particularly dangerous in containerized deployments where Python is often run with -O for performance optimization.
  • CVE-2026-54235 could enable denial-of-service or unpredictable model behavior by sending malformed temperature parameters that crash or destabilize GPU kernels.
  • These are distinct from the earlier vLLM CVEs covered on June 1 (CVE-2026-22778 heap leak, CVE-2026-34756 DoS), showing continued security surface expansion.

Why it matters

vLLM is the dominant open-source inference framework for production LLM serving. When the inference layer becomes an execution surface, attackers can pivot from model abuse to host compromise. The assert bypass is a classic Python anti-pattern: security checks that vanish under optimization flags.

What to do

  • Upgrade vLLM to ≥ 0.23.1rc0 immediately.
  • Audit deployment configs: if using python -O or PYTHONOPTIMIZE=1, treat CVE-2026-41523 as critical.
  • Add input validation at the API gateway layer to reject NaN/Infinity temperature values before they reach vLLM.
  • Monitor GPU telemetry for anomalous behavior following suspicious inference requests.

Sources