arXiv — ASPI Shows Clarification-Seeking Amplifies Prompt Injection in LLM Agents

AI relevance: a widely recommended UX pattern — asking the agent to clarify ambiguous instructions before executing them — creates a state transition that dramatically increases prompt injection success rates across every frontier model tested.

Researchers from Scale API published ASPI (Ambiguous-State Prompt Injection), a benchmark of 728 task-attack scenarios that isolates the clarification-seeking agent state and measures its effect on injection vulnerability under controlled conditions. The paper is a reminder that "good UX" and "secure UX" can diverge sharply in agentic systems.

Key findings

  • Attack success on o3 rises from 1.8% in fully-specified execution to 34.0% when the agent enters clarification mode — a 19× increase.
  • Gemini 3-Flash shows a similar gap: 2.2% → 35.7%.
  • The effect is consistent across all ten frontier LLMs evaluated, spanning OpenAI, Google, Anthropic, and open-weight families.
  • Decomposition analysis attributes the gap to both a state-dependent shift in how models process incoming content and a channel-specific effect from the agent-solicited clarification interface.
  • The benchmark separates tool-returned adversarial content (execution setting) from user-provided clarification input (clarification setting), showing the latter is the dominant attack channel.

Why it matters

Clarification-seeking is baked into modern agent design: Copilot, Cursor, and custom ReAct loops all encourage agents to ask follow-up questions when intent is unclear. This paper shows that the very act of soliciting additional user input creates a widened attack surface that standard execution-time security evaluations completely miss. If your agent security testing only probes fully-specified tasks, you are systematically underestimating risk.

What to do

  • Test interactive flows separately. Run prompt injection evaluations against clarification-seeking agent paths, not just single-turn execution.
  • Apply input separation at the clarification boundary. Treat user-supplied clarification as untrusted input, even though the agent explicitly requested it.
  • Consider rate-limiting or sanitizing clarification rounds. Multi-turn clarification sessions compound exposure with each round-trip.
  • Use ASPI as a baseline. The benchmark and code are open source — drop it into your existing red-team pipeline.

Sources