arXiv — Indirect prompt injection competition findings for AI agents

AI relevance: The paper measures how often real agent workflows can be hijacked by untrusted content, directly affecting tool-calling, coding, and computer-use systems in production.

  • The study analyzes a large-scale public red-teaming competition focused on indirect prompt injection against AI agents.
  • Across 464 participants, the competition produced roughly 272,000 attack attempts and 8,648 successful attacks.
  • The scenarios span tool calling, coding, and computer use, which makes the findings more operationally useful than single-benchmark prompt-injection demos.
  • All evaluated frontier models were vulnerable, with reported attack success rates ranging from 0.5% to 8.5%.
  • The paper highlights concealment as a core risk: an agent can execute a harmful action while returning a final answer that looks benign to the user.
  • The authors found transferable attack strategies that worked across multiple behaviors and model families, suggesting structural weaknesses rather than one-off bugs.
  • Capability did not reliably predict robustness; stronger models were not automatically safer against indirect prompt injection.
  • The team says it plans quarterly benchmark updates, which matters because agent defenses and model behavior age quickly.

Why it matters

  • Indirect prompt injection is no longer a theoretical edge case; this paper shows it is repeatable at scale across realistic agent tasks.
  • The concealment angle is especially nasty for enterprise agents because audit gaps can hide compromise behind a normal-looking final response.
  • For teams deploying coding agents, browser agents, or workflow automations, the main lesson is that model quality alone will not solve the tool-abuse problem.

What to do

  • Treat retrieved content as untrusted instructions: label, isolate, and down-rank external text before it reaches tool-capable agents.
  • Log actions, not just answers: review tool invocations, file writes, and network access so concealed attacks are still observable.
  • Benchmark your own stack: run indirect prompt-injection tests against the exact agent workflows you operate, especially coding and browser-use paths.

Sources