Orca Security — Pickle RCE in SGLang LLM Framework (CVE-2026-3059/3060)
AI relevance: SGLang is a widely deployed open-source framework for serving LLMs and multimodal models; pickle deserialization on network-exposed ZMQ ports gives any network-adjacent attacker unauthenticated RCE on GPU inference servers running production AI workloads.
- CVE-2026-3059 (CVSS 9.8): unsafe
pickle.loads()on untrusted data received via ZMQ in the multimodal generation broker (scheduler_client.py). No authentication required; network-reachable. - CVE-2026-3060 (CVSS 9.8): same pickle deserialization flaw in the disaggregation encoder receiver (
encode_receiver.py), also network-reachable and unauthenticated. - A third CVE-2026-3989 (CVSS 7.8) covers insecure pickle in the crash-dump replay utility (
replay_request_dump.py), local-only but still exploitable. - All three stem from Python's
picklemodule processing data from network sockets — a pattern Orca calls "Pickle in the Pipeline," endemic across ML/AI infrastructure. - Orca disclosed through CERT/CC (case VU#665416). At time of publication, no official SGLang patch exists and maintainers have not responded to coordinated disclosure.
- The proposed patch (unmerged) replaces pickle with a safe serialization format, but operators must manually apply it or block network access to the affected ports.
- SGLang competes with vLLM and TGI for LLM serving; deployments often expose ZMQ ports on shared GPU clusters, making network-adjacent exploitation realistic.
- The broader pattern — pickle on untrusted network data — repeats across MLflow, Ray, PyTorch, and other ML frameworks, making this a systemic supply-chain class issue.
Why it matters
Pickle deserialization bugs in ML infrastructure are not new, but they keep appearing because the Python ML ecosystem defaults to pickle for inter-process communication. SGLang's case is particularly dangerous because the vulnerable ports are network-exposed by design (for multimodal and disaggregated serving), require zero authentication, and sit on GPU servers that typically hold model weights, API keys, and access to training data stores. A single unauthenticated packet turns an inference server into a foothold for lateral movement across the AI cluster. The lack of a vendor patch at disclosure time is a reminder that open-source AI frameworks often lack mature security response processes.
What to do
- Block exposed ports: Immediately firewall ZMQ ports used by SGLang's multimodal broker and disaggregation encoder. Restrict access to only trusted cluster nodes.
- Apply the unmerged patch: Review the proposed fix from Orca/CERT/CC and consider cherry-picking it if you run SGLang in production.
- Scan for pickle usage: Audit your broader AI/ML stack for
pickle.loads()on network-received data. Orca's research notes this is a systemic pattern. - Network segmentation: Isolate inference-serving infrastructure from general network access. GPU servers should not be reachable from untrusted segments.
- Monitor for exploitation: Look for unexpected ZMQ connections, anomalous Python child processes on inference nodes, or outbound network calls from SGLang hosts.
- Consider alternatives: If SGLang maintainers remain unresponsive, evaluate whether vLLM or TGI better meet your security response requirements.