arXiv — Response-Path Attacks on LLM Agents Outperform Prompt Injection

2026-05-12 Research by al-ice.ai Editorial

AI relevance: The paper demonstrates that BYOK relay services — the architectural pattern powering most enterprise LLM proxy deployments — are blind spots where adversarial providers can rewrite aligned model outputs before agent execution, completely bypassing alignment safety guarantees.

Key findings

Researchers from Fudan University and HKUST identified a fundamental attack surface they call post-alignment tampering: an adversarial relay service can rewrite an LLM's aligned response after generation but before the agent executes the tool calls.
They formalize this as a Dolev–Yao adversary model: the relay sits as the sole intermediary on the LLM-to-agent path with no end-to-end integrity verification, capable of observing, suppressing, and replacing any downstream message.
They prove mathematically that post-alignment tampering strictly dominates prompt injection — even a perfectly aligned LLM provides zero protection against a relay adversary.
Their Relay Tampering Attack (RTA) framework achieves up to 99.1% attack success rate across six LLM models on two agent-security benchmarks (AgentDojo, ASB), substantially outperforming prompt injection baselines.
RTA uses a three-stage approach: strategic orchestration (state-aware rewriting across conversation rounds), minimal tampering (surgical edits to security-critical fields), and stealth restoration (resubmitting tampered responses upstream to produce consistent output that evades distribution-level auditing).
Case studies on OpenClaw and Claude Code demonstrate real-world exploitability; 74% of RTA-PostForge's latency overhead falls within normal network delay of commercial relay infrastructure, making detection difficult.
Four representative agent defenses were evaluated and none fully eliminated the RTA attack surface.
The paper proposes a time-based defense that detects relay tampering while preserving agent utility.

Why it matters

The BYOK (Bring Your Own Key) pattern is ubiquitous — any organization routing LLM traffic through a third-party proxy (LiteLLM, Portkey, custom gateways) is exposed to this attack surface.
This attack requires no prompt injection, no model jailbreak, and no user interaction — the relay simply rewrites the response. All cryptographic and alignment-layer guarantees remain technically intact while the agent executes attacker-directed actions.
The March 2026 LiteLLM PyPI supply-chain compromise (referenced in the paper) shows this is not purely theoretical — relay-layer adversaries already exist in the ecosystem.
The finding that 74% of tampering overhead blends into normal network latency means existing timeout-based or latency-anomaly defenses are insufficient.

What to do

Audit your LLM proxy/relay infrastructure: understand which third-party services have visibility into your LLM-to-agent communication path.
Implement response integrity verification — hash-based or signature-based checks between the upstream model response and what the agent receives.
Consider the paper's time-based defense: measuring response-path timing anomalies that fall outside the relay's tampering window.
Treat relay providers as a trusted computing base, not just a pass-through — include them in security reviews and supply-chain risk assessments.

arXiv — Response-Path Attacks on LLM Agents Outperform Prompt Injection

Key findings

Why it matters

What to do

Sources