GreyNoise — Threat actors actively targeting exposed LLM endpoints

• Category: Security

  • What happened: GreyNoise summarized findings from an Ollama honeypot that recorded 91,403 attack sessions (Oct 2025–Jan 2026) aimed at AI/LLM deployment surfaces.
  • Campaign 1 (SSRF-style callbacks): attackers tried to force servers to make outbound “phone home” requests, including attempts via Ollama model pull URL handling (and co-occurring probes against Twilio SMS webhook patterns).
  • Validation channel: GreyNoise says attackers used ProjectDiscovery OAST callback domains to confirm the server made the outbound request.
  • Campaign 2 (enumeration): two IPs launched a high-volume probe across 73+ model endpoints, generating 80,469 sessions in ~11 days to fingerprint exposed LLM proxies.
  • Fingerprint prompts: the probe used deliberately innocuous queries (e.g., “How many states are there…?”, “What model are you?”) likely to identify which backend responds without tripping content filters.
  • Model coverage: the probe list included OpenAI-compatible and Gemini formats, spanning OpenAI, Anthropic, Meta (Llama), DeepSeek, Google (Gemini), Mistral, Qwen, xAI, etc.
  • Attribution posture: GreyNoise frames the SSRF/OAST campaign as possibly research/bug bounty behavior, but assesses the enumeration as a more concerning threat-actor recon pattern.
  • Defensive indicators: the post includes suggested blocks (OAST domains, IPs/ASNs, and JA4 fingerprints) and highlights egress filtering + rate limiting.

Why it matters

  • LLM endpoints are becoming “internet services”: once you expose an OpenAI-compatible API, you inherit the same recon/scan lifecycle as any other service.
  • Misconfigured proxies are a real prize: if a proxy forwards to paid/commercial APIs, attackers can turn it into free inference (or a foothold for deeper access) via simple enumeration.
  • Egress is security: the SSRF angle is a reminder that outbound connectivity from model hosts can be the exploit confirmation path.

What to do

  1. Require auth + isolate: keep LLM APIs off the public internet where possible; enforce strong auth and per-tenant rate limits where not.
  2. Detect the “innocuous fingerprint” pattern: alert on rapid requests across many model names/endpoints, especially using the exact prompt strings GreyNoise highlighted.
  3. Defensive validation (safe): run an internal scan of your environment for accidentally exposed OpenAI-compatible routes (e.g., /v1/chat/completions, /v1/models) on non-edge hosts.
  4. Egress filtering: restrict model servers from making arbitrary outbound HTTP requests; explicitly allow only required registries and upstream APIs.

Sources