arXiv — AgentDyn prompt injection benchmark

AI relevance: AgentDyn targets indirect prompt injection against tool-using agents, mirroring the same attack surface used by production copilots and workflow agents.

  • AgentDyn is a new benchmark for indirect prompt injection against real-world agent tasks.
  • It introduces 60 open-ended tasks with 560 injection test cases across Shopping, GitHub, and Daily Life domains.
  • Unlike static benchmarks, tasks require dynamic planning and include helpful third-party instructions that can be abused.
  • The authors highlight three benchmark gaps: lack of dynamic tasks, lack of helpful instructions, and simplistic user goals.
  • They evaluate 10 state-of-the-art defenses and find most are either insufficiently secure or over-blocking.
  • The release includes a public benchmark dataset and test harness for follow-on evaluations.

Why it matters

  • Indirect prompt injection is a top risk for deployed agents, and benchmarks shape what defenses get built and trusted.
  • Security teams need realistic, open-ended test cases to understand tradeoffs between safety and usability in agent workflows.

What to do

  • Re-run your agent defenses against AgentDyn-style open-ended tasks, not just static datasets.
  • Track over-defense metrics (false blocks) alongside attack success rate.
  • Instrument tool boundaries (allowlists, provenance, least-privilege) before relying on prompt filters.

Sources