LLMjacking: Five Routes Attackers Use to Steal Inference

AI relevance: Threat actors now wire live LLM APIs directly into malware so it can adapt its behavior at runtime on infected hosts — and joint research from Kodem Security and Intezer maps exactly how they access model inference without paying.

What happened

  • Top Cybench-ranked models (Claude Opus 4.6, Claude Sonnet 4.5, Grok 4) can write functional exploit code, reason through credential chains, and sustain multi-step reconnaissance workflows — capabilities previously requiring human expertise.
  • Malware families now embed live LLM API calls rather than generating payloads offline, enabling runtime adaptation on infected hosts.
  • Underground forums sell cyber-oriented LLMs (WormGPT, GhostGPT, KawaiiGPT, Xanthorox) — these are fine-tuned open-weight models or jailbroken wrappers marketed as having no content filters, useful for phishing and simple malware stubs.
  • Attackers access frontier models through third-party payment services (PayWithMoon, AIMLAPI) that accept cryptocurrency without identity verification, creating a funding dead end for investigators.
  • Free-tier inference APIs from Groq, Cerebras, Cohere, Mistral, HuggingFace, OpenRouter, and SambaNova offer usable credentials requiring only a disposable email — some allow millions of tokens per month at zero cost.
  • Keyless endpoints like Pollinations.ai and DuckDuckGo's Duck.ai provide OpenAI-compatible access with no authentication at all.
  • The LameHug/PROMPTSTEAL malware family calls HuggingFace's Inference API for Qwen 2.5-Coder-32B-Instruct to drive reconnaissance and data theft with no embedded credentials.
  • Exposed API keys in GitHub repos, config files, and compiled apps remain a major route — attackers scan VirusTotal submissions and public repositories for leaked provider tokens.
  • Self-hosted LLM servers (Ollama, vLLM) left exposed on the internet provide unauthenticated access to inference and model weights.

Why it matters

LLM inference has become a weaponizable resource. When malware can call a frontier model at runtime, it gains capabilities that static payloads cannot match — adaptive exploit generation, context-aware phishing, and dynamic C2 behavior. The barrier to entry is near zero: free tiers, anonymous payment services, and exposed servers all provide functional access without identity verification.

What to do

  • Never hardcode LLM provider API keys in application code, configs, or scripts — use secrets management and rotate credentials regularly.
  • Scan repositories and compiled artifacts for leaked provider tokens before pushing to version control.
  • Bind self-hosted inference servers (Ollama, vLLM) to localhost or enforce authentication — they should never be reachable on 0.0.0.0.
  • Monitor LLM API usage for anomalous patterns that indicate compromised keys.

Sources