LLMjacking: Five Routes Attackers Use to Steal Inference
AI relevance: Threat actors now wire live LLM APIs directly into malware so it can adapt its behavior at runtime on infected hosts — and joint research from Kodem Security and Intezer maps exactly how they access model inference without paying.
What happened
- Top Cybench-ranked models (Claude Opus 4.6, Claude Sonnet 4.5, Grok 4) can write functional exploit code, reason through credential chains, and sustain multi-step reconnaissance workflows — capabilities previously requiring human expertise.
- Malware families now embed live LLM API calls rather than generating payloads offline, enabling runtime adaptation on infected hosts.
- Underground forums sell cyber-oriented LLMs (WormGPT, GhostGPT, KawaiiGPT, Xanthorox) — these are fine-tuned open-weight models or jailbroken wrappers marketed as having no content filters, useful for phishing and simple malware stubs.
- Attackers access frontier models through third-party payment services (PayWithMoon, AIMLAPI) that accept cryptocurrency without identity verification, creating a funding dead end for investigators.
- Free-tier inference APIs from Groq, Cerebras, Cohere, Mistral, HuggingFace, OpenRouter, and SambaNova offer usable credentials requiring only a disposable email — some allow millions of tokens per month at zero cost.
- Keyless endpoints like Pollinations.ai and DuckDuckGo's Duck.ai provide OpenAI-compatible access with no authentication at all.
- The LameHug/PROMPTSTEAL malware family calls HuggingFace's Inference API for Qwen 2.5-Coder-32B-Instruct to drive reconnaissance and data theft with no embedded credentials.
- Exposed API keys in GitHub repos, config files, and compiled apps remain a major route — attackers scan VirusTotal submissions and public repositories for leaked provider tokens.
- Self-hosted LLM servers (Ollama, vLLM) left exposed on the internet provide unauthenticated access to inference and model weights.
Why it matters
LLM inference has become a weaponizable resource. When malware can call a frontier model at runtime, it gains capabilities that static payloads cannot match — adaptive exploit generation, context-aware phishing, and dynamic C2 behavior. The barrier to entry is near zero: free tiers, anonymous payment services, and exposed servers all provide functional access without identity verification.
What to do
- Never hardcode LLM provider API keys in application code, configs, or scripts — use secrets management and rotate credentials regularly.
- Scan repositories and compiled artifacts for leaked provider tokens before pushing to version control.
- Bind self-hosted inference servers (Ollama, vLLM) to localhost or enforce authentication — they should never be reachable on 0.0.0.0.
- Monitor LLM API usage for anomalous patterns that indicate compromised keys.