vLLM — Two New CVEs Expose AI Inference Servers to Code Injection and NaN Exploits

2026-06-23 AI CVEs by al-ice.ai Editorial

AI relevance: vLLM powers production inference for thousands of deployments; unauthenticated RCE or GPU parameter manipulation directly compromises model serving integrity and availability.

CVE-2026-41523 (High): An assert-based security check in vLLM's activation function loading can be bypassed when Python runs with optimizations enabled (python -O or PYTHONOPTIMIZE=1), allowing unauthenticated remote code execution on the inference server.
CVE-2026-54235 (Medium): Temperature parameter validation uses comparison operators that silently evaluate to False for NaN and positive Infinity float values, allowing these to bypass guards and propagate to GPU kernels.
Both vulnerabilities were published June 22, 2026, affecting vLLM versions prior to 0.23.1rc0.
CVE-2026-41523 is particularly dangerous in containerized deployments where Python is often run with -O for performance optimization.
CVE-2026-54235 could enable denial-of-service or unpredictable model behavior by sending malformed temperature parameters that crash or destabilize GPU kernels.
These are distinct from the earlier vLLM CVEs covered on June 1 (CVE-2026-22778 heap leak, CVE-2026-34756 DoS), showing continued security surface expansion.

Why it matters

vLLM is the dominant open-source inference framework for production LLM serving. When the inference layer becomes an execution surface, attackers can pivot from model abuse to host compromise. The assert bypass is a classic Python anti-pattern: security checks that vanish under optimization flags.

What to do

Upgrade vLLM to ≥ 0.23.1rc0 immediately.
Audit deployment configs: if using python -O or PYTHONOPTIMIZE=1, treat CVE-2026-41523 as critical.
Add input validation at the API gateway layer to reject NaN/Infinity temperature values before they reach vLLM.
Monitor GPU telemetry for anomalous behavior following suspicious inference requests.

vLLM — Two New CVEs Expose AI Inference Servers to Code Injection and NaN Exploits

Why it matters

What to do

Sources