arXiv — From prompt injections to protocol exploits

• Category: Research

AI relevance: If you deploy LLM agents with tool access (and especially MCP/A2A-style connectivity), this survey is a useful map of where real-world compromises happen across the full stack: untrusted inputs → model behavior → system boundaries → protocol plumbing.

  • Big picture: the paper argues agent security failures rarely stay “just prompt injection” — once a model can call tools, the failure mode becomes a workflow compromise with real side effects.
  • It proposes an end-to-end threat model for LLM-agent ecosystems, covering both host→tool and agent→agent paths.
  • The taxonomy is split into four buckets: Input Manipulation, Model Compromise, System & Privacy Attacks, and Protocol Vulnerabilities (explicitly naming MCP/ACP/ANP/A2A).
  • One practical takeaway: defenses that only filter user prompts won’t help much against indirect injections arriving via retrieved docs, tickets, web pages, emails, or tool outputs.
  • Protocol-level focus is timely: as teams adopt shared agent/tool protocols for discoverability and orchestration, capability discovery + authz + transport become part of the attack surface.
  • It highlights “brittle integrations” as a recurring root cause: credentials sprawl, ad-hoc adapters, and unclear approval boundaries when the agent escalates from “read” to “act”.
  • The survey is also a useful checklist of attacker objectives in agent systems: data exfil, tool abuse, persistence/backdoors, retrieval poisoning, and cross-agent interference.

Why it matters

  • Most orgs are standardizing on an “agent layer” (connectors + protocols) faster than they’re standardizing on security primitives (least privilege, provenance, isolation, approvals, auditing).
  • As MCP/A2A-like patterns spread, we should expect more failures that look like classic distributed-systems security bugs: confused deputy, weak trust boundaries, over-broad credentials, and unintended data flows.

What to do

  • Model your agent like a service account: explicitly list tools, data sources, and destinations it can touch; make the “write” paths very small.
  • Separate retrieval from action: treat retrieved content as attacker-controlled; require an explicit policy gate before tool execution (especially for network/file/email/issue-tracker writes).
  • Harden protocol endpoints: authenticate MCP servers, log tool calls and inputs/outputs, and apply per-tool rate limits and egress controls.
  • Plan for poisoning: monitor and version your RAG corpora and connector configurations (who changed what, when), and add rollback paths.

Sources