arXiv — ChatInject: abusing chat templates for prompt injection in LLM agents

2026-02-05 • Category: Research

AI relevance: The paper demonstrates a higher-success prompt-injection method that targets how agentic LLMs format and interpret chat templates.

ChatInject embeds malicious instructions inside content that mimics native chat-template structure, exploiting the model’s instruction-following bias rather than plain-text injections.
The authors propose a multi-turn persuasion variant that primes agents across several turns to accept suspicious actions, not just a single injected response.
Across benchmarks, the method boosts average attack success rates: 5.18% → 32.05% on AgentDojo and 15.13% → 45.90% on InjecAgent.
Multi-turn dialogues reach an average 52.33% success rate on InjecAgent, indicating compounding risk in longer agent conversations.
Chat-template payloads show transferability across models, remaining effective even when the target model’s internal template is unknown.
The paper reports that prompt-based defenses are largely ineffective against the chat-template and multi-turn variants.
The study focuses on indirect prompt injection — adversarial instructions embedded in external environment output that an agent ingests.

Why it matters

Agent builders often rely on message-formatting as a safety boundary; ChatInject suggests that boundary can be co-opted by attackers who imitate template structure.
Multi-turn persuasion increases real-world risk because production agents routinely operate over long conversations with external systems, not isolated prompts.
Defenses that only scan raw text may miss payloads that look like “normal” chat formatting.

Audit template handling in your agent framework: ensure system/user roles cannot be forged by untrusted inputs.
Instrument multi-turn monitoring to detect gradual persuasion patterns that shift agent intent over time.
Test against ChatInject-style prompts in red-team evaluations, especially for agents consuming external web or tool outputs.