Microsoft — Poisoned MCP Tool Descriptions Make AI Agents Silently Exfiltrate Company Data

AI relevance: As enterprise AI agents gain write capabilities through MCP, tool descriptions become a new attack surface — poisoned text can redirect agent behavior to exfiltrate data while every individual step appears legitimate and authorized.

  • Microsoft Incident Response and Defender research teams published a detailed attack scenario showing how poisoned MCP tool descriptions can hijack AI agents and silently exfiltrate company data — without the agent breaking any access rules.
  • The attack targets the trust boundary between MCP components: tool descriptions are plain text that agents read to decide when and how to use tools. Because descriptions are just words, they can carry hidden instructions.
  • In the demonstrated scenario, a finance team's invoice-handling agent connects to an approved third-party "invoice enrichment" service. The attacker updates the tool's description — the name and visible summary stay the same, but hidden formatting notes instruct the agent to grab the last 30 unpaid invoices and attach them to the next API call.
  • MCP picks up description changes on the fly. In setups without a re-approval trigger, the poisoned version goes live with no additional review. The agent follows the hidden order, collects invoices, and sends them as part of a normal-looking request while quietly copying stolen data to an attacker-controlled server.
  • Each step the agent takes is individually legitimate: the tool was approved, the data query ran with the analyst's own permissions, and the outbound call went to an allowed server. The weakness lives in the trust boundary between components, not in any single system.
  • The deeper architectural problem: MCP mixes instructions and data in the same place. A tool's description lives in the agent's working memory alongside its real orders, so editing that description can steer the agent as effectively as rewriting its system prompt.
  • Microsoft also documented a related attack where a malicious GitHub issue could hijack an agent connected to the GitHub MCP server and exfiltrate data from private repositories — the tools were trusted and untouched; bad instructions rode in through issue text.
  • The research marks a shift from the earlier framing of AI risk as "what a model reads and writes" to "what an agent actually does." Against a reader, injection changes the output. Against an agent, it changes behavior.

Why it matters

This is the clearest demonstration yet that MCP's openness — its greatest strength — is also its most dangerous weakness. Enterprise agents built in Copilot Studio or Azure AI Foundry can reach into business systems and run multi-step jobs autonomously. When tool descriptions become an injection vector, every connected third-party integration becomes a potential attack surface with no traditional security boundary to defend.

What to do

  • Implement re-approval triggers for MCP tool description changes — treat description updates like code changes that require review before deployment.
  • Deploy output validation layers that detect anomalous data flows: if an agent suddenly starts attaching invoice data to routine queries, that should trigger an alert.
  • Scope agent permissions to the minimum required for each task. The invoice agent should not have access to all unpaid invoices by default.
  • Monitor MCP tool description changes as a security event. Log who changed what, when, and require explicit approval for production tool updates.
  • Consider human-in-the-loop approval for any agent action that exfiltrates data outside the organization, even if the data source and destination are both "allowed."

Sources