Orca Security — Pickle RCE in SGLang LLM Framework (CVE-2026-3059/3060)

2026-03-18 AI CVEs by al-ice.ai Editorial

AI relevance: SGLang is a widely deployed open-source framework for serving LLMs and multimodal models; pickle deserialization on network-exposed ZMQ ports gives any network-adjacent attacker unauthenticated RCE on GPU inference servers running production AI workloads.

CVE-2026-3059 (CVSS 9.8): unsafe pickle.loads() on untrusted data received via ZMQ in the multimodal generation broker (scheduler_client.py). No authentication required; network-reachable.
CVE-2026-3060 (CVSS 9.8): same pickle deserialization flaw in the disaggregation encoder receiver (encode_receiver.py), also network-reachable and unauthenticated.
A third CVE-2026-3989 (CVSS 7.8) covers insecure pickle in the crash-dump replay utility (replay_request_dump.py), local-only but still exploitable.
All three stem from Python's pickle module processing data from network sockets — a pattern Orca calls "Pickle in the Pipeline," endemic across ML/AI infrastructure.
Orca disclosed through CERT/CC (case VU#665416). At time of publication, no official SGLang patch exists and maintainers have not responded to coordinated disclosure.
The proposed patch (unmerged) replaces pickle with a safe serialization format, but operators must manually apply it or block network access to the affected ports.
SGLang competes with vLLM and TGI for LLM serving; deployments often expose ZMQ ports on shared GPU clusters, making network-adjacent exploitation realistic.
The broader pattern — pickle on untrusted network data — repeats across MLflow, Ray, PyTorch, and other ML frameworks, making this a systemic supply-chain class issue.

Why it matters

Pickle deserialization bugs in ML infrastructure are not new, but they keep appearing because the Python ML ecosystem defaults to pickle for inter-process communication. SGLang's case is particularly dangerous because the vulnerable ports are network-exposed by design (for multimodal and disaggregated serving), require zero authentication, and sit on GPU servers that typically hold model weights, API keys, and access to training data stores. A single unauthenticated packet turns an inference server into a foothold for lateral movement across the AI cluster. The lack of a vendor patch at disclosure time is a reminder that open-source AI frameworks often lack mature security response processes.

What to do

Block exposed ports: Immediately firewall ZMQ ports used by SGLang's multimodal broker and disaggregation encoder. Restrict access to only trusted cluster nodes.
Apply the unmerged patch: Review the proposed fix from Orca/CERT/CC and consider cherry-picking it if you run SGLang in production.
Scan for pickle usage: Audit your broader AI/ML stack for pickle.loads() on network-received data. Orca's research notes this is a systemic pattern.
Network segmentation: Isolate inference-serving infrastructure from general network access. GPU servers should not be reachable from untrusted segments.
Monitor for exploitation: Look for unexpected ZMQ connections, anomalous Python child processes on inference nodes, or outbound network calls from SGLang hosts.
Consider alternatives: If SGLang maintainers remain unresponsive, evaluate whether vLLM or TGI better meet your security response requirements.

Orca Security — Pickle RCE in SGLang LLM Framework (CVE-2026-3059/3060)

Why it matters

What to do

Sources