CVE-2026-22778 (GHSA-4r2x-xpjr-7cvv) is a critical remote code execution chain in vLLM, the widely-used LLM serving framework with 3M+ monthly PyPI downloads.
Attack vector: send a crafted video URL to vLLM's OpenAI-compatible Completions or Invocations endpoint → vLLM downloads the video → a malicious JPEG2000 cdef box triggers a heap overflow in the OpenCV/FFmpeg decoder → arbitrary command execution on the server.
The chain exploits two bugs: (1) a PIL error message info leak that exposes memory addresses and bypasses ASLR, and (2) a JPEG2000 cdef box heap overflow where the Y (luma) plane is remapped into the smaller U (chroma) buffer, causing a controlled 0.75×W×H byte overflow.
Zero authentication required: default vLLM instances have no API key. Even with a non-default API key enabled, the Invocations route processes the payload pre-auth.
Only deployments serving a video-capable model are affected — text-only or image-only endpoints are not vulnerable.
The vulnerability was fixed in vLLM 0.14.1. All prior versions serving video models are affected.
The root cause sits deep in the media pipeline: vLLM delegates video decoding to cv2.VideoCapture over raw downloaded bytes, trusting the container format without sanitization.
Why it matters
vLLM is the dominant open-source LLM serving engine — used in production by thousands of organizations. A pre-auth RCE in it means attackers can compromise AI inference infrastructure with a single HTTP request.
Multimodal models (video, image) are rapidly being adopted, expanding vLLM's attack surface beyond text-only deployments. This is the fourth vLLM CVE in two months (after 22773, 24779, 22807).
The attack requires no credentials, no user interaction, and no local access — just a reachable vLLM endpoint accepting video input.
What to do
Upgrade immediately to vLLM ≥ 0.14.1 if you serve video-capable models.
Enable API authentication: never expose vLLM without an API key, and restrict the Invocations route.
Network segmentation: vLLM endpoints should not be directly internet-facing. Place them behind an API gateway with auth and rate limiting.
Audit multimodal inputs: consider pre-validating or sandboxing media processing before it reaches the inference engine.