MCPTox — Tool Poisoning Benchmark Shows 73% Attack Success Rate on MCP Agents

AI relevance: Tool poisoning embeds malicious instructions directly in MCP server tool descriptions — no code execution needed — making every agent that connects to untrusted MCP servers vulnerable to credential theft and unauthorized operations.

Key Findings

  • First benchmark for MCP tool poisoning: MCPTox evaluates 20 prominent LLM agents against poisoned tool descriptions across 45 live MCP servers and 353 authentic tools.
  • 72.8% attack success rate on o1-mini — and more capable models tend to be more susceptible, because the attack exploits their stronger instruction-following abilities.
  • Three attack templates generate 1,312 test cases across 10 risk categories, including SSH key exfiltration, file system abuse, and unauthorized API calls — all triggered through tool metadata alone.
  • Safety alignment is largely ineffective: even the most resistant model (Claude 3.7 Sonnet) refused fewer than 3% of poisoning attacks, because the malicious instructions use legitimate tools for unauthorized operations.
  • Distinct from indirect prompt injection: repurposing IPI benchmark payloads for tool poisoning yielded near-zero success, confirming this is a separate attack vector that existing benchmarks miss.
  • Attack mechanism: poisoned tool descriptions are injected during the MCP registration phase, entering the LLM's context before any user request — the agent then follows hidden rules embedded in seemingly legitimate tool metadata.

Why It Matters

Tool poisoning is fundamentally different from code-level vulnerabilities. The attack lives in plain text — tool descriptions that hosts load without scrutiny. An attacker doesn't need to compromise server code; they just need to publish a poisoned server to a registry. When an agent connects, the malicious instructions are baked into the context and executed alongside legitimate tool calls. The benchmark proves this works at scale across real-world servers and modern agents.

What to Do

  • Filter tool descriptions: Run heuristic or ML-based checks on MCP tool metadata before loading into agent context, flagging instructions that attempt to override agent behavior.
  • Least-privilege tool access: Scope MCP server permissions to only the tools your agent actually needs — reduce the blast radius when a poisoned tool is discovered.
  • Monitor tool call patterns: Alert on unusual tool invocations (e.g., a file tool accessing credential paths during an unrelated task).
  • Use MCPTox for testing: Run your agent against the MCPTox benchmark before deploying to production to establish a baseline robustness score.

Sources