GreyNoise — 91,403 Attack Sessions Target Exposed LLM Infrastructure
AI relevance: Attackers are actively scanning and exploiting exposed LLM inference servers, MCP endpoints, and RAG pipelines — treating AI infrastructure the same way they treat any internet-facing service, often before teams realize it's reachable from the public internet.
What happened
- GreyNoise honeypot infrastructure captured 91,403 attack sessions targeting exposed LLM endpoints between October 2025 and January 2026, across two distinct campaigns systematically mapping AI deployment attack surfaces.
- Discovery is trivially easy: default ports (11434 for Ollama, 8000 for vLLM) make fingerprinting straightforward, and most teams deploy inference servers with out-of-the-box configurations that announce themselves to internet scanners.
- A separate 293-day investigation covering 7.23 million observations found 175,000 unique Ollama hosts publicly accessible across 130 countries — production AI servers belonging to real organizations.
- Attackers pursue multiple exploitation paths simultaneously: compute abuse (running high-volume inference on victims' GPU resources), data exfiltration (extracting context-window contents including customer records and API keys), and lateral movement (using connected MCP servers and webhook integrations to reach internal infrastructure).
- RAG-based deployments are particularly dangerous: attackers don't need to access the underlying databases or document stores directly — they just need to ask the model the right question through an exposed endpoint.
- Common misconfigurations include exposed ports, unauthenticated APIs, and MCP servers with no access controls — teams often open ports for convenience during development and never close them when moving to production.
Why it matters
The assumption that AI infrastructure is "internal" is collapsing under the weight of rapid deployment. Inference servers, vector databases, and MCP tool endpoints are being spun up at developer speed without corresponding security controls. The GreyNoise data shows attackers already treat LLM infrastructure as a first-class target — not a theoretical risk, but a systematically exploited one.
What to do
- Audit all LLM inference endpoints for internet exposure — check Ollama (11434), vLLM (8000), and custom API ports against your firewall rules.
- Require authentication on all model-serving endpoints, even those intended for internal use only.
- Review MCP server configurations for access controls; treat every tool-connected agent endpoint as internet-facing by default.
- Implement network segmentation between AI infrastructure and sensitive internal systems to prevent lateral movement through agent tool chains.