arXiv: SoK on prompt injection attacks against agentic coding assistants

• Category: Research

  • Paper: a Systematization of Knowledge (SoK) on prompt injection attacks targeting agentic coding assistants (Claude Code, Copilot, Cursor, Codex CLI, etc.).
  • Core claim: once an assistant has file/shell/tool access, the attack surface expands from “bad outputs” to credential theft, code injection, and system compromise.
  • Taxonomy: the authors propose a three-dimensional classification across delivery vectors, attack modalities, and propagation behaviors.
  • Meta-analysis: they synthesize results from dozens of studies and argue that adaptive attacks still succeed at high rates even against many published defenses.
  • Tool/skill ecosystems called out: skill registries and protocols like MCP increase capability—but also create new “semantic layer” exploitation paths where data and instructions blur.
  • Bottom line: treat prompt injection as a first-class vulnerability class that needs defense-in-depth at the architecture level, not just text filtering.

Why it matters

  • This is the clearest security framing for “vibe coding with tools”: the model can be perfect and you can still lose if untrusted content can influence tool selection + tool arguments.
  • MCP-style ecosystems look a lot like browser extension ecosystems: powerful, composable, and an obvious target for malicious plugins, poisoned repos, and supply-chain tricks.
  • It reinforces a practical engineering takeaway: the safe boundary is not “the prompt,” it’s the execution boundary (tool call, file write, network egress, credential access).

What to do

  1. Assume repo content is hostile: READMEs, issues, PR descriptions, and docs are untrusted input. Treat them like you’d treat HTML in a browser.
  2. Put a policy engine in front of tools: allow-list tool categories, validate arguments, and block dangerous destinations/commands.
  3. Isolate execution: run agents in sandboxes/VMs with least privilege; don’t hand them long-lived credentials by default.
  4. Log & review: capture every tool call + reason + outputs, so you can detect “agent went weird” as a security incident, not a shrug.
  5. Harden MCP/tool servers: authenticate, scope tokens per tool, and treat tool servers as part of your attack surface (not “just integrations”).

Sources