arXiv — LLM-agent threat model and attack taxonomy survey
AI relevance: The paper maps real-world LLM agent failures (prompt injection, tool abuse, protocol exploits) into a unified threat model that applies directly to AI agent deployments.
- The survey proposes an end-to-end threat model for LLM-agent ecosystems, covering host-to-tool and agent-to-agent communications.
- It catalogs 30+ attack techniques spanning input manipulation, model compromise, system/privacy abuse, and protocol-level vulnerabilities.
- The taxonomy explicitly connects prompt-to-SQL injections with higher-layer agent workflows rather than treating them as isolated app bugs.
- Protocol-level examples include tool-schema confusion and cross-agent message tampering in multi-agent workflows.
- The authors highlight weak validation and ad-hoc auth across plugins/connectors as a systemic amplifier of risk.
- Mitigations emphasize dynamic trust management, provenance tracking, and sandboxing of tool interfaces.
- The paper maps incidents to CVE/NVD records to ground the taxonomy in real, published vulnerabilities.
Why it matters
- Agent security failures increasingly blend classic injection bugs with protocol exploits, making narrow app-only reviews insufficient.
- Unified taxonomies help teams prioritize controls across prompts, tools, and inter-agent protocols instead of patching one layer at a time.
- By linking to public CVEs, the survey provides a defensible checklist for audits and risk sign-off.
What to do
- Threat model by layer: separate input, tool, and protocol risks in your agent design reviews.
- Harden tool boundaries: enforce schema validation and least-privilege scopes on tool calls.
- Track provenance: log tool responses and agent decisions to support forensic triage.
- Map to CVEs: use the paper’s CVE/NVD mapping as a starting point for patch prioritization.