Microsoft Defender Guide — Memory Poisoning, Jailbreaks, Evasion for AI Agents

AI relevance: Microsoft's Azure Dev Community Hub published a comprehensive defense-oriented guide mapping four attack surfaces—memory poisoning, cross-prompt injection, jailbreaks, and evasion techniques—to concrete mitigations for agentic AI and RAG deployments.

Key Findings

  • The guide categorizes AI-specific attack surfaces into memory poisoning (LLM04, LLM08), cross-prompt injection (LLM01), jailbreaks (LLM01/02/05), and evasion techniques targeting input moderation filters.
  • Memory poisoning targets runtime memory—the dynamic knowledge agents accumulate—rather than training data. Research on the MINJA attack demonstrated injection success rates exceeding 95% in controlled evaluations.
  • As few as 250 malicious documents may suffice to backdoor LLMs through RAG-based memory poisoning, according to published Anthropic research.
  • The Agent Security Bench (ASB) benchmark reported over 84% average attack success rate across 27 attack/defense combinations in e-commerce, healthcare, and finance scenarios.
  • Microsoft outlines defense strategies including trust-aware retrieval with composite trust scores, input sanitization pipelines, and runtime guardrails using Azure AI Content Safety and Prompt Shields.
  • In practice, attackers combine techniques: ROT13 or ASCII smuggling evasion to deliver cross-prompt injection payloads hidden in documents that then poison agent memory.

Why It Matters

As enterprises deploy agents with persistent memory, RAG pipelines, and multi-turn reasoning, the attack surface shifts from traditional code vulnerabilities to reasoning-level compromises. An attacker can influence agent behavior across sessions by corrupting its memory stores—a threat model traditional application security tooling was never designed to handle. The guide provides a structured mapping of OWASP LLM Top 10 categories to concrete enterprise defense patterns.

What to Do

  • Audit your agent's memory architecture: identify which memory types (in-context, episodic, semantic via vector DBs, tool-state caching) your agents use and assess trust boundaries for each.
  • Implement trust-aware retrieval: score memory entries by source reputation, temporal freshness, and known attack patterns before injection into context windows.
  • Deploy runtime guardrails that inspect both user input and retrieved context for injection markers before they reach the model.
  • Treat agent memory stores as high-value assets—monitor for anomalous write patterns, unexpected content insertions, or bulk document ingestion.

Sources