arXiv — The Landscape of Prompt Injection Threats in LLM Agents

AI relevance: The paper analyzes prompt injection attacks and defenses specifically for LLM agents and introduces a new benchmark to evaluate agent security under realistic, context-dependent tasks.

  • SoK-style survey of prompt injection (PI) in LLM agents, covering attacks, defenses, and evaluation practices.
  • Introduces taxonomies that classify PI attacks by payload generation strategy (heuristic vs. optimization) and defenses by intervention stage (text, model, execution).
  • Reports a systemic gap: many defenses and benchmarks overlook context-dependent tasks where agents must use runtime observations to act.
  • Proposes AgentPI, a new benchmark aimed at evaluating agent behavior under context-dependent interaction settings.
  • Empirical evaluation with AgentPI finds no single defense achieves high trustworthiness, high utility, and low latency simultaneously.
  • Finds some defenses appear strong on existing benchmarks by suppressing context, but fail to generalize to realistic agent settings.
  • Distills open research problems and guidance for designing secure LLM agents.

Why it matters

  • Agentic systems are increasingly deployed in real workflows; benchmarks that ignore context-dependent reasoning can overstate security.
  • A structured taxonomy and a new benchmark provide clearer baselines for comparing defenses and identifying where they break in practice.

What to do

  • Re-evaluate defenses: If your controls rely on suppressing context, test them against tasks where context is essential.
  • Adopt AgentPI-style evaluation: Incorporate context-dependent interaction tests into your internal security benchmarks.
  • Track trade-offs: Measure trustworthiness, utility, and latency together when selecting PI defenses.

Sources