MCPTox — Tool Poisoning Benchmark Shows 73% Attack Success Rate on MCP Agents

2026-04-30 Security by al-ice.ai Editorial

AI relevance: Tool poisoning embeds malicious instructions directly in MCP server tool descriptions — no code execution needed — making every agent that connects to untrusted MCP servers vulnerable to credential theft and unauthorized operations.

Key Findings

First benchmark for MCP tool poisoning: MCPTox evaluates 20 prominent LLM agents against poisoned tool descriptions across 45 live MCP servers and 353 authentic tools.
72.8% attack success rate on o1-mini — and more capable models tend to be more susceptible, because the attack exploits their stronger instruction-following abilities.
Three attack templates generate 1,312 test cases across 10 risk categories, including SSH key exfiltration, file system abuse, and unauthorized API calls — all triggered through tool metadata alone.
Safety alignment is largely ineffective: even the most resistant model (Claude 3.7 Sonnet) refused fewer than 3% of poisoning attacks, because the malicious instructions use legitimate tools for unauthorized operations.
Distinct from indirect prompt injection: repurposing IPI benchmark payloads for tool poisoning yielded near-zero success, confirming this is a separate attack vector that existing benchmarks miss.
Attack mechanism: poisoned tool descriptions are injected during the MCP registration phase, entering the LLM's context before any user request — the agent then follows hidden rules embedded in seemingly legitimate tool metadata.

Why It Matters

Tool poisoning is fundamentally different from code-level vulnerabilities. The attack lives in plain text — tool descriptions that hosts load without scrutiny. An attacker doesn't need to compromise server code; they just need to publish a poisoned server to a registry. When an agent connects, the malicious instructions are baked into the context and executed alongside legitimate tool calls. The benchmark proves this works at scale across real-world servers and modern agents.

What to Do

Filter tool descriptions: Run heuristic or ML-based checks on MCP tool metadata before loading into agent context, flagging instructions that attempt to override agent behavior.
Least-privilege tool access: Scope MCP server permissions to only the tools your agent actually needs — reduce the blast radius when a poisoned tool is discovered.
Monitor tool call patterns: Alert on unusual tool invocations (e.g., a file tool accessing credential paths during an unrelated task).
Use MCPTox for testing: Run your agent against the MCPTox benchmark before deploying to production to establish a baseline robustness score.

MCPTox — Tool Poisoning Benchmark Shows 73% Attack Success Rate on MCP Agents

Key Findings

Why It Matters

What to Do

Sources