SGLang CVE-2026-5760 — RCE via Malicious GGUF Model Files
AI relevance: SGLang is one of the fastest-growing open-source inference frameworks for serving LLMs — this flaw means anyone loading community-sourced GGUF models from Hugging Face or other repositories risks immediate remote code execution on their inference server.
- CVE-2026-5760 carries a CVSS score of 9.8 (Critical) and enables unauthenticated remote code execution in SGLang.
- The attack vector is a malicious GGUF model file with a crafted tokenizer
.chat_templatefield containing a Jinja2 server-side template injection (SSTI) payload. - Exploitation targets the reranker serving endpoint at
entrypoints/openai/serving_rerank.py— the template is processed unsafely when the reranker code path is triggered. - The Qwen3 reranker trigger phrase activates the vulnerable code path, meaning models positioned as Qwen-compatible rerankers are the most likely attack vector.
- GGUF is widely used for quantized local model deployment. Hugging Face hosts thousands of community-published GGUF files with minimal validation.
- The vulnerability is network-reachable without authentication — any SGLang instance exposing its OpenAI-compatible API to the internet is at risk.
- This represents a new class of AI supply-chain attack: the model artifact itself is the exploit carrier, not a dependency package or configuration file.
- SGLang has seen rapid adoption for vLLM-alternative deployments, making this a high-impact issue for the AI inference landscape.
Why it matters
This extends the AI supply-chain attack surface beyond Python packages and MCP servers into the model files themselves. Organizations that pull GGUF models from public repos for local inference are running untrusted binary formats through processing pipelines that execute embedded template code — essentially treating model files as executable content. With CVSS 9.8 and no authentication required, this is a top-tier risk for anyone running SGLang in production.
What to do
- Update SGLang to the patched version immediately.
- Do not load GGUF models from untrusted sources into internet-facing SGLang instances.
- Review whether your reranker endpoints are exposed externally — restrict access with network-level controls.
- If you use SGLang's OpenAI-compatible API, verify the reranker endpoint is behind authentication or disabled if unused.