arXiv — Poisoning the Watchtower: Prompt Injection Against LLM-Augmented SOC Analysts

AI relevance: LLM-based SOC copilots ingest attacker-controlled log fields (URLs, user agents, DNS queries, payloads), creating a structural failure mode where adversarial content in the log substrate itself becomes prompt injection — directly undermining the security tools designed to detect intrusions.

What happened

  • Pandey & Bhujang (DigitalOcean / Arizona State University) published arXiv:2605.24421, introducing "log-substrate prompt injection" — a class of attacks where adversary-controlled log fields carry instructions to LLM analyst assistants.
  • The paper defines a four-class taxonomy: S1 (direct override), S2 (persona hijack), S3 (context manipulation), and S4 (obfuscated payloads).
  • Evaluated 48 strategy-defense-task combinations using gpt-4o-mini as the analyst model.
  • Persona hijacks (S2) suppressed 68% of malicious-log classifications under naive prompting and remained effective against stronger defenses.
  • Summarization was the highest-risk task: context manipulation (S3) reached 96% injection success rate without defenses, dropping to only 38% under constrained-output defenses.
  • Direct overrides (S1) were surprisingly ineffective at 0% suppression — the model resisted straightforward instruction overrides even in log fields.
  • Even the strongest tested defense configuration only reduced average injection success from 26.6% to 11.8% — the attack surface was narrowed but not eliminated.
  • Simulation-based mock analysts substantially mispredicted real model behavior, particularly for direct overrides, suggesting that red-team simulations may underestimate or mischaracterize real-world risk.

Why it matters

As organizations deploy LLM copilots for triage, summarization, and incident response, the assumption that log data is "evidence" rather than "instructions" becomes a critical design flaw. An attacker who knows their target SOC uses LLM-based analysis can plant instructions directly in HTTP requests, DNS queries, or authentication attempts — turning the security tool itself against the defenders. The paper's finding that summarization (the most commonly marketed SOC AI use case) is the most vulnerable task is particularly alarming.

What to do

  • Treat raw log content as adversarial input in any LLM pipeline — apply input sanitization, field-level escaping, and context separation before passing to the model.
  • Use constrained output formats for classification and triage tasks to reduce injection surface (reduced S3 success from 96% to 38% in the study).
  • Avoid persona-based system prompts in SOC copilots — the paper shows persona hijacks remain effective even under stronger defenses.
  • Do not rely solely on simulation-based red-teaming — the paper demonstrates that mock analysts mispredict real model behavior, so empirical testing on the actual deployed model is essential.
  • Monitor for behavioral drift in LLM triage outputs — a sudden change in classification patterns or suppression rates may indicate ongoing log-substrate injection.

Sources