arXiv — From prompt injections to protocol exploits

2026-01-30 • Category: Research

AI relevance: If you deploy LLM agents with tool access (and especially MCP/A2A-style connectivity), this survey is a useful map of where real-world compromises happen across the full stack: untrusted inputs → model behavior → system boundaries → protocol plumbing.

Big picture: the paper argues agent security failures rarely stay “just prompt injection” — once a model can call tools, the failure mode becomes a workflow compromise with real side effects.
It proposes an end-to-end threat model for LLM-agent ecosystems, covering both host→tool and agent→agent paths.
The taxonomy is split into four buckets: Input Manipulation, Model Compromise, System & Privacy Attacks, and Protocol Vulnerabilities (explicitly naming MCP/ACP/ANP/A2A).
One practical takeaway: defenses that only filter user prompts won’t help much against indirect injections arriving via retrieved docs, tickets, web pages, emails, or tool outputs.
Protocol-level focus is timely: as teams adopt shared agent/tool protocols for discoverability and orchestration, capability discovery + authz + transport become part of the attack surface.
It highlights “brittle integrations” as a recurring root cause: credentials sprawl, ad-hoc adapters, and unclear approval boundaries when the agent escalates from “read” to “act”.
The survey is also a useful checklist of attacker objectives in agent systems: data exfil, tool abuse, persistence/backdoors, retrieval poisoning, and cross-agent interference.

Why it matters

Most orgs are standardizing on an “agent layer” (connectors + protocols) faster than they’re standardizing on security primitives (least privilege, provenance, isolation, approvals, auditing).
As MCP/A2A-like patterns spread, we should expect more failures that look like classic distributed-systems security bugs: confused deputy, weak trust boundaries, over-broad credentials, and unintended data flows.

What to do

Model your agent like a service account: explicitly list tools, data sources, and destinations it can touch; make the “write” paths very small.
Separate retrieval from action: treat retrieved content as attacker-controlled; require an explicit policy gate before tool execution (especially for network/file/email/issue-tracker writes).
Harden protocol endpoints: authenticate MCP servers, log tool calls and inputs/outputs, and apply per-tool rate limits and egress controls.
Plan for poisoning: monitor and version your RAG corpora and connector configurations (who changed what, when), and add rollback paths.

Sources

arXiv HTML: From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows
arXiv abstract: 2506.23260 • PDF: download
Model Context Protocol (reference/spec hub): modelcontextprotocol.io