arXiv — The Landscape of Prompt Injection Threats in LLM Agents
AI relevance: The paper analyzes prompt injection attacks and defenses specifically for LLM agents and introduces a new benchmark to evaluate agent security under realistic, context-dependent tasks.
- SoK-style survey of prompt injection (PI) in LLM agents, covering attacks, defenses, and evaluation practices.
- Introduces taxonomies that classify PI attacks by payload generation strategy (heuristic vs. optimization) and defenses by intervention stage (text, model, execution).
- Reports a systemic gap: many defenses and benchmarks overlook context-dependent tasks where agents must use runtime observations to act.
- Proposes AgentPI, a new benchmark aimed at evaluating agent behavior under context-dependent interaction settings.
- Empirical evaluation with AgentPI finds no single defense achieves high trustworthiness, high utility, and low latency simultaneously.
- Finds some defenses appear strong on existing benchmarks by suppressing context, but fail to generalize to realistic agent settings.
- Distills open research problems and guidance for designing secure LLM agents.
Why it matters
- Agentic systems are increasingly deployed in real workflows; benchmarks that ignore context-dependent reasoning can overstate security.
- A structured taxonomy and a new benchmark provide clearer baselines for comparing defenses and identifying where they break in practice.
What to do
- Re-evaluate defenses: If your controls rely on suppressing context, test them against tasks where context is essential.
- Adopt AgentPI-style evaluation: Incorporate context-dependent interaction tests into your internal security benchmarks.
- Track trade-offs: Measure trustworthiness, utility, and latency together when selecting PI defenses.