vLLM CVE-2026-4944 — Hardcoded trust_remote_code Bypass Enables RCE

AI relevance: vLLM is the dominant open-source model serving engine; hardcoded trust_remote_code in model loaders means even operators who explicitly disable remote code trust can still be compromised through malicious HuggingFace model repositories.

  • CVE-2026-4944 (CVSS 8.8) was published May 28, 2026 for vllm-project/vllm 0.14.1.
  • The vulnerability originates in two model implementation files: nemotron_vl.py and kimi_k25.py, which hardcode trust_remote_code=True.
  • This bypasses any user-configured --trust-remote-code=False setting, nullifying an explicit security control.
  • An attacker hosting a malicious model on HuggingFace can achieve RCE when a vulnerable vLLM instance loads a NemotronVL or KimiK25 model.
  • This is an incomplete fix for prior CVEs: CVE-2025-66448 and CVE-2026-22807 — the patched paths were addressed, but these two model files were missed.
  • Exploitation requires network access to host the malicious model but no authentication on the victim vLLM instance.
  • No public PoC at time of writing; attack complexity is rated High.
  • The pattern repeats a known class of failures in HuggingFace model loading: custom tokenizer/model code is a trusted execution boundary that should never be forced on.

Why it matters

vLLM processes millions of inference requests daily across enterprises and cloud providers. Hardcoded trust overrides mean security-conscious operators who explicitly disable remote code execution are still exposed. The fact that this is the third related CVE in this codebase signals systemic difficulty auditing the sprawling model-loading surface.

What to do

  • Audit your vLLM deployment version; if running 0.14.1, check for patches or restrict model loading to vetted repositories only.
  • Network-isolate model serving infrastructure from untrusted internet sources.
  • Monitor HuggingFace model uploads for known project names if you auto-load models by identifier.
  • Track the fix status — no patch details were in the advisory at time of publication.

Sources: