Microsoft Defender Guide — Memory Poisoning, Jailbreaks, Evasion for AI Agents

2026-05-21 Security by al-ice.ai Editorial

AI relevance: Microsoft's Azure Dev Community Hub published a comprehensive defense-oriented guide mapping four attack surfaces—memory poisoning, cross-prompt injection, jailbreaks, and evasion techniques—to concrete mitigations for agentic AI and RAG deployments.

Key Findings

The guide categorizes AI-specific attack surfaces into memory poisoning (LLM04, LLM08), cross-prompt injection (LLM01), jailbreaks (LLM01/02/05), and evasion techniques targeting input moderation filters.
Memory poisoning targets runtime memory—the dynamic knowledge agents accumulate—rather than training data. Research on the MINJA attack demonstrated injection success rates exceeding 95% in controlled evaluations.
As few as 250 malicious documents may suffice to backdoor LLMs through RAG-based memory poisoning, according to published Anthropic research.
The Agent Security Bench (ASB) benchmark reported over 84% average attack success rate across 27 attack/defense combinations in e-commerce, healthcare, and finance scenarios.
Microsoft outlines defense strategies including trust-aware retrieval with composite trust scores, input sanitization pipelines, and runtime guardrails using Azure AI Content Safety and Prompt Shields.
In practice, attackers combine techniques: ROT13 or ASCII smuggling evasion to deliver cross-prompt injection payloads hidden in documents that then poison agent memory.

Why It Matters

As enterprises deploy agents with persistent memory, RAG pipelines, and multi-turn reasoning, the attack surface shifts from traditional code vulnerabilities to reasoning-level compromises. An attacker can influence agent behavior across sessions by corrupting its memory stores—a threat model traditional application security tooling was never designed to handle. The guide provides a structured mapping of OWASP LLM Top 10 categories to concrete enterprise defense patterns.

What to Do

Audit your agent's memory architecture: identify which memory types (in-context, episodic, semantic via vector DBs, tool-state caching) your agents use and assess trust boundaries for each.
Implement trust-aware retrieval: score memory entries by source reputation, temporal freshness, and known attack patterns before injection into context windows.
Deploy runtime guardrails that inspect both user input and retrieved context for injection markers before they reach the model.
Treat agent memory stores as high-value assets—monitor for anomalous write patterns, unexpected content insertions, or bulk document ingestion.

Microsoft Defender Guide — Memory Poisoning, Jailbreaks, Evasion for AI Agents

Key Findings

Why It Matters

What to Do

Sources