Cybersecurity Insiders — Three Prompt Injection Detection Blind Spots

AI relevance: Prompt injection payloads don't generate the anomalous network or process telemetry that WAFs and EDR platforms alert on — SOC teams need dedicated RAG, agent-orchestration, and conversation-history logging to detect three distinct injection variants in production AI systems.

Overview

Prompt injection has emerged as the detection-engineering gap in AI security. The attack doesn't arrive at the firewall looking like an attack — it arrives as a PDF a customer uploaded, a task description an AI agent retrieved, or a calendar invite a scheduling assistant processed. Cybersecurity Insiders published analysis identifying three injection patterns that production detection stacks consistently miss.

The Three Variants

  1. Indirect injection via retrieved document content. Malicious instructions are embedded in documents the LLM retrieves at query time — a PDF in a customer file store, a Jira ticket, a Confluence page. The model processes the retrieval result and the embedded instruction executes with the retrieval session's privileges. WAFs and EDR see nothing because the payload arrives as legitimate HTTP from the LLM's own retrieval call.
  2. Second-order injection through AI agent tool calls. In agentic deployments where the LLM calls external tools (web search, code execution, database queries), an attacker injects malicious instructions into the tool's output. The agent calls the tool legitimately; the tool returns attacker-controlled content; the agent executes the embedded instruction in its next reasoning step. Detection requires agent orchestration logs, not the tool's own logs.
  3. Conversation-history poisoning for behavioral context shift. In persistent multi-turn LLM deployments, an attacker uses a sequence of seemingly benign turns to progressively shift the model's behavioral context. By the time the harmful direction is issued, the model's prior context treats it as consistent with an established pattern. Detection requires turn-level behavioral analysis across rolling windows.

Why It Matters

OWASP's Top 10 for LLM Applications ranks prompt injection as the most severe risk category. MITRE ATLAS maps three prompt injection subtechniques under initial-access (AML.T0054.001/002/003). None generate the telemetry signals that SOC detection stacks are calibrated for. Teams treating "prompt injection" as a single category produce blind spots for the variants they aren't specifically instrumenting. Each of the three requires a different telemetry source: RAG pipeline logs, agent orchestration tool-call pairs, and conversation-session metadata with turn-level analysis.

What to Do

  • Instrument RAG retrieval logging: Log every retrieval operation feeding the model context window — source URL, document hash, and the content slice retrieved. Alert on retrieved content with high-density imperative verb structures against known injection signature patterns.
  • Log tool-call input/output pairs: In agentic deployments, record every tool-call input and output with the originating task context. Alert on tool outputs that produce instruction-formatted text in the next reasoning turn.
  • Monitor conversation history drift: Segment persistent sessions into rolling windows (e.g., 10 turns). Alert on sessions where user-to-assistant turn length ratio inverts sharply, topic entropy drops, or system-role language appears in user turns.
  • Treat retrieval sources as untrusted: Even "internal" documents (Confluence, Jira, shared drives) must be assumed potentially attacker-writable. Apply the same input validation to retrieval pipelines as to user-facing input.

Sources