x41 Security — CVE-2026-48746 vLLM Authentication Bypass Affects Versions 0.3.0–0.22.0

What happened

  • x41 Security discovered CVE-2026-48746, an authentication bypass in vLLM's OpenAI-compatible API layer that affects versions from 0.3.0 through 0.22.0 — an unusually wide affected range spanning the majority of vLLM's release history.
  • The vulnerability exists in how ASGI web servers interact with Starlette's AuthenticationMiddleware. vLLM's implementation trusts the ASGI web server's authentication state, but Starlette's middleware can be bypassed, allowing requests to reach the API without providing the configured VLLM_API_KEY or --api-key.
  • The bypass means any network attacker can access the full vLLM inference API — including model loading, prompt submission, and response retrieval — on deployments that believe they are protected by API key authentication.
  • vLLM is one of the most widely deployed LLM inference engines, powering production serving infrastructure for hundreds of organizations. The affected version range (0.3.0 to 0.22.0) means most existing deployments are likely vulnerable unless already running the latest patch.
  • The fix was merged in PR #43426 and shipped in vLLM 0.22.0+. The GitHub advisory GHSA-94f4-hr76-p5j6 provides full technical details.

Why it matters

  • vLLM inference servers often hold access to expensive GPU resources, proprietary model weights, and internal API keys for upstream providers. An auth bypass turns a supposedly private inference endpoint into a public one.
  • Many vLLM deployments rely on API key authentication as their primary access control — this bypass removes that control entirely without any visible misconfiguration.
  • The wide affected range (0.3.0–0.22.0) means organizations that are even slightly behind on updates are exposed. The vulnerability was present for a significant portion of vLLM's production history.
  • Combined with vLLM's other recent CVEs (CVE-2026-22778 for video RCE, CVE-2026-54235 for code injection, CVE-2026-41523 for activation function RCE), this auth bypass can serve as an initial access vector for deeper exploitation chains.

What to do

  • Update vLLM to version 0.22.0 or later immediately. Verify the running version of every vLLM instance — container images and cached pip installs may contain older versions.
  • Do not rely solely on API key authentication for vLLM access. Add network-level controls: restrict inference endpoints to internal networks, use reverse proxies with independent authentication, and enforce TLS.
  • Audit access logs for vLLM instances running affected versions to identify any unauthorized API usage during the exposure window.
  • If running behind a reverse proxy or API gateway, verify that the proxy enforces authentication independently — do not assume the vLLM-level API key is your only defense.

Sources