arXiv — Response-Path Attacks on LLM Agents Outperform Prompt Injection
AI relevance: The paper demonstrates that BYOK relay services — the architectural pattern powering most enterprise LLM proxy deployments — are blind spots where adversarial providers can rewrite aligned model outputs before agent execution, completely bypassing alignment safety guarantees.
Key findings
- Researchers from Fudan University and HKUST identified a fundamental attack surface they call post-alignment tampering: an adversarial relay service can rewrite an LLM's aligned response after generation but before the agent executes the tool calls.
- They formalize this as a Dolev–Yao adversary model: the relay sits as the sole intermediary on the LLM-to-agent path with no end-to-end integrity verification, capable of observing, suppressing, and replacing any downstream message.
- They prove mathematically that post-alignment tampering strictly dominates prompt injection — even a perfectly aligned LLM provides zero protection against a relay adversary.
- Their Relay Tampering Attack (RTA) framework achieves up to 99.1% attack success rate across six LLM models on two agent-security benchmarks (AgentDojo, ASB), substantially outperforming prompt injection baselines.
- RTA uses a three-stage approach: strategic orchestration (state-aware rewriting across conversation rounds), minimal tampering (surgical edits to security-critical fields), and stealth restoration (resubmitting tampered responses upstream to produce consistent output that evades distribution-level auditing).
- Case studies on OpenClaw and Claude Code demonstrate real-world exploitability; 74% of RTA-PostForge's latency overhead falls within normal network delay of commercial relay infrastructure, making detection difficult.
- Four representative agent defenses were evaluated and none fully eliminated the RTA attack surface.
- The paper proposes a time-based defense that detects relay tampering while preserving agent utility.
Why it matters
- The BYOK (Bring Your Own Key) pattern is ubiquitous — any organization routing LLM traffic through a third-party proxy (LiteLLM, Portkey, custom gateways) is exposed to this attack surface.
- This attack requires no prompt injection, no model jailbreak, and no user interaction — the relay simply rewrites the response. All cryptographic and alignment-layer guarantees remain technically intact while the agent executes attacker-directed actions.
- The March 2026 LiteLLM PyPI supply-chain compromise (referenced in the paper) shows this is not purely theoretical — relay-layer adversaries already exist in the ecosystem.
- The finding that 74% of tampering overhead blends into normal network latency means existing timeout-based or latency-anomaly defenses are insufficient.
What to do
- Audit your LLM proxy/relay infrastructure: understand which third-party services have visibility into your LLM-to-agent communication path.
- Implement response integrity verification — hash-based or signature-based checks between the upstream model response and what the agent receives.
- Consider the paper's time-based defense: measuring response-path timing anomalies that fall outside the relay's tampering window.
- Treat relay providers as a trusted computing base, not just a pass-through — include them in security reviews and supply-chain risk assessments.