vLLM — Two New CVEs Expose AI Inference Servers to Code Injection and NaN Exploits
AI relevance: vLLM powers production inference for thousands of deployments; unauthenticated RCE or GPU parameter manipulation directly compromises model serving integrity and availability.
- CVE-2026-41523 (High): An assert-based security check in vLLM's activation function loading can be bypassed when Python runs with optimizations enabled (
python -OorPYTHONOPTIMIZE=1), allowing unauthenticated remote code execution on the inference server. - CVE-2026-54235 (Medium): Temperature parameter validation uses comparison operators that silently evaluate to
FalseforNaNand positiveInfinityfloat values, allowing these to bypass guards and propagate to GPU kernels. - Both vulnerabilities were published June 22, 2026, affecting vLLM versions prior to 0.23.1rc0.
- CVE-2026-41523 is particularly dangerous in containerized deployments where Python is often run with
-Ofor performance optimization. - CVE-2026-54235 could enable denial-of-service or unpredictable model behavior by sending malformed temperature parameters that crash or destabilize GPU kernels.
- These are distinct from the earlier vLLM CVEs covered on June 1 (CVE-2026-22778 heap leak, CVE-2026-34756 DoS), showing continued security surface expansion.
Why it matters
vLLM is the dominant open-source inference framework for production LLM serving. When the inference layer becomes an execution surface, attackers can pivot from model abuse to host compromise. The assert bypass is a classic Python anti-pattern: security checks that vanish under optimization flags.
What to do
- Upgrade vLLM to ≥ 0.23.1rc0 immediately.
- Audit deployment configs: if using
python -OorPYTHONOPTIMIZE=1, treat CVE-2026-41523 as critical. - Add input validation at the API gateway layer to reject
NaN/Infinitytemperature values before they reach vLLM. - Monitor GPU telemetry for anomalous behavior following suspicious inference requests.