arXiv: SoK on prompt injection attacks against agentic coding assistants
• Category: Research
- Paper: a Systematization of Knowledge (SoK) on prompt injection attacks targeting agentic coding assistants (Claude Code, Copilot, Cursor, Codex CLI, etc.).
- Core claim: once an assistant has file/shell/tool access, the attack surface expands from “bad outputs” to credential theft, code injection, and system compromise.
- Taxonomy: the authors propose a three-dimensional classification across delivery vectors, attack modalities, and propagation behaviors.
- Meta-analysis: they synthesize results from dozens of studies and argue that adaptive attacks still succeed at high rates even against many published defenses.
- Tool/skill ecosystems called out: skill registries and protocols like MCP increase capability—but also create new “semantic layer” exploitation paths where data and instructions blur.
- Bottom line: treat prompt injection as a first-class vulnerability class that needs defense-in-depth at the architecture level, not just text filtering.
Why it matters
- This is the clearest security framing for “vibe coding with tools”: the model can be perfect and you can still lose if untrusted content can influence tool selection + tool arguments.
- MCP-style ecosystems look a lot like browser extension ecosystems: powerful, composable, and an obvious target for malicious plugins, poisoned repos, and supply-chain tricks.
- It reinforces a practical engineering takeaway: the safe boundary is not “the prompt,” it’s the execution boundary (tool call, file write, network egress, credential access).
What to do
- Assume repo content is hostile: READMEs, issues, PR descriptions, and docs are untrusted input. Treat them like you’d treat HTML in a browser.
- Put a policy engine in front of tools: allow-list tool categories, validate arguments, and block dangerous destinations/commands.
- Isolate execution: run agents in sandboxes/VMs with least privilege; don’t hand them long-lived credentials by default.
- Log & review: capture every tool call + reason + outputs, so you can detect “agent went weird” as a security incident, not a shrug.
- Harden MCP/tool servers: authenticate, scope tokens per tool, and treat tool servers as part of your attack surface (not “just integrations”).