Microsoft Defender Guide — Memory Poisoning, Jailbreaks, Evasion for AI Agents
AI relevance: Microsoft's Azure Dev Community Hub published a comprehensive defense-oriented guide mapping four attack surfaces—memory poisoning, cross-prompt injection, jailbreaks, and evasion techniques—to concrete mitigations for agentic AI and RAG deployments.
Key Findings
- The guide categorizes AI-specific attack surfaces into memory poisoning (LLM04, LLM08), cross-prompt injection (LLM01), jailbreaks (LLM01/02/05), and evasion techniques targeting input moderation filters.
- Memory poisoning targets runtime memory—the dynamic knowledge agents accumulate—rather than training data. Research on the MINJA attack demonstrated injection success rates exceeding 95% in controlled evaluations.
- As few as 250 malicious documents may suffice to backdoor LLMs through RAG-based memory poisoning, according to published Anthropic research.
- The Agent Security Bench (ASB) benchmark reported over 84% average attack success rate across 27 attack/defense combinations in e-commerce, healthcare, and finance scenarios.
- Microsoft outlines defense strategies including trust-aware retrieval with composite trust scores, input sanitization pipelines, and runtime guardrails using Azure AI Content Safety and Prompt Shields.
- In practice, attackers combine techniques: ROT13 or ASCII smuggling evasion to deliver cross-prompt injection payloads hidden in documents that then poison agent memory.
Why It Matters
As enterprises deploy agents with persistent memory, RAG pipelines, and multi-turn reasoning, the attack surface shifts from traditional code vulnerabilities to reasoning-level compromises. An attacker can influence agent behavior across sessions by corrupting its memory stores—a threat model traditional application security tooling was never designed to handle. The guide provides a structured mapping of OWASP LLM Top 10 categories to concrete enterprise defense patterns.
What to Do
- Audit your agent's memory architecture: identify which memory types (in-context, episodic, semantic via vector DBs, tool-state caching) your agents use and assess trust boundaries for each.
- Implement trust-aware retrieval: score memory entries by source reputation, temporal freshness, and known attack patterns before injection into context windows.
- Deploy runtime guardrails that inspect both user input and retrieved context for injection markers before they reach the model.
- Treat agent memory stores as high-value assets—monitor for anomalous write patterns, unexpected content insertions, or bulk document ingestion.