arXiv — ASPI Shows Clarification-Seeking Amplifies Prompt Injection in LLM Agents
AI relevance: a widely recommended UX pattern — asking the agent to clarify ambiguous instructions before executing them — creates a state transition that dramatically increases prompt injection success rates across every frontier model tested.
Researchers from Scale API published ASPI (Ambiguous-State Prompt Injection), a benchmark of 728 task-attack scenarios that isolates the clarification-seeking agent state and measures its effect on injection vulnerability under controlled conditions. The paper is a reminder that "good UX" and "secure UX" can diverge sharply in agentic systems.
Key findings
- Attack success on o3 rises from 1.8% in fully-specified execution to 34.0% when the agent enters clarification mode — a 19× increase.
- Gemini 3-Flash shows a similar gap: 2.2% → 35.7%.
- The effect is consistent across all ten frontier LLMs evaluated, spanning OpenAI, Google, Anthropic, and open-weight families.
- Decomposition analysis attributes the gap to both a state-dependent shift in how models process incoming content and a channel-specific effect from the agent-solicited clarification interface.
- The benchmark separates tool-returned adversarial content (execution setting) from user-provided clarification input (clarification setting), showing the latter is the dominant attack channel.
Why it matters
Clarification-seeking is baked into modern agent design: Copilot, Cursor, and custom ReAct loops all encourage agents to ask follow-up questions when intent is unclear. This paper shows that the very act of soliciting additional user input creates a widened attack surface that standard execution-time security evaluations completely miss. If your agent security testing only probes fully-specified tasks, you are systematically underestimating risk.
What to do
- Test interactive flows separately. Run prompt injection evaluations against clarification-seeking agent paths, not just single-turn execution.
- Apply input separation at the clarification boundary. Treat user-supplied clarification as untrusted input, even though the agent explicitly requested it.
- Consider rate-limiting or sanitizing clarification rounds. Multi-turn clarification sessions compound exposure with each round-trip.
- Use ASPI as a baseline. The benchmark and code are open source — drop it into your existing red-team pipeline.