SecurityWeek — Critical Ollama CVE-2026-7482 Exposes 300K Deployments
AI relevance: Ollama is the most popular self-hosted LLM inference engine — this CVSS 9.3 vulnerability exposes prompts, API keys, and secrets from roughly 300,000 internet-facing deployments to unauthenticated remote attackers.
- CVE-2026-7482 (CVSS 9.3), dubbed "Bleeding Llama" by Cyera, is a heap out-of-bounds read in Ollama's GGUF model loader.
- The bug triggers when an attacker supplies a crafted GGUF file with a tensor offset and size larger than the actual file length, causing Ollama to read past the allocated heap buffer.
- Heap memory accessed this way can contain sensitive data: user prompts, conversation history, environment variables, API keys, tokens, and secrets used by the deployment.
- Attackers leverage Ollama's built-in model push feature to exfiltrate the leaked memory to an attacker-controlled server — the full attack requires only three unauthenticated API calls.
- Ollama launches by default without authentication and binds to all network interfaces, meaning every internet-exposed instance is vulnerable without additional protections.
- Cyera estimates approximately 300,000 Ollama servers are currently accessible on the public internet.
- Successful exploitation could expose employee interactions with AI assistants, development code, routed tool outputs, and prompts containing PII, PHI, or other regulated data.
Why it matters
Ollama is the go-to tool for running open-weight LLMs locally and in development environments. Its default configuration — no auth, listen on all interfaces — was designed for convenience, not production security. With 300,000 exposed instances and a trivially exploitable three-call attack, the attack surface is massive. Any organization using Ollama to process internal data, credentials, or user prompts should treat internet-exposed instances as already compromised.
What to do
- Upgrade Ollama to v0.17.1 or later immediately.
- Audit all running Ollama instances for internet exposure; assume any publicly accessible instance has been read.
- Deploy an authentication proxy (e.g., nginx with auth) and restrict network access to trusted subnets.
- Rotate any API keys, tokens, or secrets that were available in environment variables or prompt context on exposed instances.
- Consider network segmentation to isolate LLM inference endpoints from broader infrastructure.