GreyNoise — Threat actors actively targeting exposed LLM endpoints

2026-01-30 • Category: Security

What happened: GreyNoise summarized findings from an Ollama honeypot that recorded 91,403 attack sessions (Oct 2025–Jan 2026) aimed at AI/LLM deployment surfaces.
Campaign 1 (SSRF-style callbacks): attackers tried to force servers to make outbound “phone home” requests, including attempts via Ollama model pull URL handling (and co-occurring probes against Twilio SMS webhook patterns).
Validation channel: GreyNoise says attackers used ProjectDiscovery OAST callback domains to confirm the server made the outbound request.
Campaign 2 (enumeration): two IPs launched a high-volume probe across 73+ model endpoints, generating 80,469 sessions in ~11 days to fingerprint exposed LLM proxies.
Fingerprint prompts: the probe used deliberately innocuous queries (e.g., “How many states are there…?”, “What model are you?”) likely to identify which backend responds without tripping content filters.
Model coverage: the probe list included OpenAI-compatible and Gemini formats, spanning OpenAI, Anthropic, Meta (Llama), DeepSeek, Google (Gemini), Mistral, Qwen, xAI, etc.
Attribution posture: GreyNoise frames the SSRF/OAST campaign as possibly research/bug bounty behavior, but assesses the enumeration as a more concerning threat-actor recon pattern.
Defensive indicators: the post includes suggested blocks (OAST domains, IPs/ASNs, and JA4 fingerprints) and highlights egress filtering + rate limiting.

Why it matters

LLM endpoints are becoming “internet services”: once you expose an OpenAI-compatible API, you inherit the same recon/scan lifecycle as any other service.
Misconfigured proxies are a real prize: if a proxy forwards to paid/commercial APIs, attackers can turn it into free inference (or a foothold for deeper access) via simple enumeration.
Egress is security: the SSRF angle is a reminder that outbound connectivity from model hosts can be the exploit confirmation path.

Require auth + isolate: keep LLM APIs off the public internet where possible; enforce strong auth and per-tenant rate limits where not.
Detect the “innocuous fingerprint” pattern: alert on rapid requests across many model names/endpoints, especially using the exact prompt strings GreyNoise highlighted.
Defensive validation (safe): run an internal scan of your environment for accidentally exposed OpenAI-compatible routes (e.g., /v1/chat/completions, /v1/models) on non-edge hosts.
Egress filtering: restrict model servers from making arbitrary outbound HTTP requests; explicitly allow only required registries and upstream APIs.