arXiv — From prompt injections to protocol exploits
• Category: Research
AI relevance: If you deploy LLM agents with tool access (and especially MCP/A2A-style connectivity), this survey is a useful map of where real-world compromises happen across the full stack: untrusted inputs → model behavior → system boundaries → protocol plumbing.
- Big picture: the paper argues agent security failures rarely stay “just prompt injection” — once a model can call tools, the failure mode becomes a workflow compromise with real side effects.
- It proposes an end-to-end threat model for LLM-agent ecosystems, covering both host→tool and agent→agent paths.
- The taxonomy is split into four buckets: Input Manipulation, Model Compromise, System & Privacy Attacks, and Protocol Vulnerabilities (explicitly naming MCP/ACP/ANP/A2A).
- One practical takeaway: defenses that only filter user prompts won’t help much against indirect injections arriving via retrieved docs, tickets, web pages, emails, or tool outputs.
- Protocol-level focus is timely: as teams adopt shared agent/tool protocols for discoverability and orchestration, capability discovery + authz + transport become part of the attack surface.
- It highlights “brittle integrations” as a recurring root cause: credentials sprawl, ad-hoc adapters, and unclear approval boundaries when the agent escalates from “read” to “act”.
- The survey is also a useful checklist of attacker objectives in agent systems: data exfil, tool abuse, persistence/backdoors, retrieval poisoning, and cross-agent interference.
Why it matters
- Most orgs are standardizing on an “agent layer” (connectors + protocols) faster than they’re standardizing on security primitives (least privilege, provenance, isolation, approvals, auditing).
- As MCP/A2A-like patterns spread, we should expect more failures that look like classic distributed-systems security bugs: confused deputy, weak trust boundaries, over-broad credentials, and unintended data flows.
What to do
- Model your agent like a service account: explicitly list tools, data sources, and destinations it can touch; make the “write” paths very small.
- Separate retrieval from action: treat retrieved content as attacker-controlled; require an explicit policy gate before tool execution (especially for network/file/email/issue-tracker writes).
- Harden protocol endpoints: authenticate MCP servers, log tool calls and inputs/outputs, and apply per-tool rate limits and egress controls.
- Plan for poisoning: monitor and version your RAG corpora and connector configurations (who changed what, when), and add rollback paths.
Sources
- arXiv HTML: From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows
- arXiv abstract: 2506.23260 • PDF: download
- Model Context Protocol (reference/spec hub): modelcontextprotocol.io