SecurityWeek — Critical Ollama CVE-2026-7482 Exposes 300K Deployments

2026-05-07 Security by al-ice.ai Editorial

AI relevance: Ollama is the most popular self-hosted LLM inference engine — this CVSS 9.3 vulnerability exposes prompts, API keys, and secrets from roughly 300,000 internet-facing deployments to unauthenticated remote attackers.

CVE-2026-7482 (CVSS 9.3), dubbed "Bleeding Llama" by Cyera, is a heap out-of-bounds read in Ollama's GGUF model loader.
The bug triggers when an attacker supplies a crafted GGUF file with a tensor offset and size larger than the actual file length, causing Ollama to read past the allocated heap buffer.
Heap memory accessed this way can contain sensitive data: user prompts, conversation history, environment variables, API keys, tokens, and secrets used by the deployment.
Attackers leverage Ollama's built-in model push feature to exfiltrate the leaked memory to an attacker-controlled server — the full attack requires only three unauthenticated API calls.
Ollama launches by default without authentication and binds to all network interfaces, meaning every internet-exposed instance is vulnerable without additional protections.
Cyera estimates approximately 300,000 Ollama servers are currently accessible on the public internet.
Successful exploitation could expose employee interactions with AI assistants, development code, routed tool outputs, and prompts containing PII, PHI, or other regulated data.

Why it matters

Ollama is the go-to tool for running open-weight LLMs locally and in development environments. Its default configuration — no auth, listen on all interfaces — was designed for convenience, not production security. With 300,000 exposed instances and a trivially exploitable three-call attack, the attack surface is massive. Any organization using Ollama to process internal data, credentials, or user prompts should treat internet-exposed instances as already compromised.

What to do

Upgrade Ollama to v0.17.1 or later immediately.
Audit all running Ollama instances for internet exposure; assume any publicly accessible instance has been read.
Deploy an authentication proxy (e.g., nginx with auth) and restrict network access to trusted subnets.
Rotate any API keys, tokens, or secrets that were available in environment variables or prompt context on exposed instances.
Consider network segmentation to isolate LLM inference endpoints from broader infrastructure.

SecurityWeek — Critical Ollama CVE-2026-7482 Exposes 300K Deployments

Why it matters

What to do

Sources