Memory Poisoning — The New Attack Surface That Beats Prompt-Injection Defenses
AI relevance: Memory poisoning targets the persistence layer of production LLM agents — malicious instructions injected into long-term memory trigger weeks later during retrieval, bypassing all prompt-layer defenses and behaving as viral propagation in multi-agent systems.
What Changed
For two years, AI security focused on prompt injection: sanitize input, validate output, scope the conversation. The attack was real but stateless — when the session ended, the attack ended. In early 2026, OWASP classified a new threat in its Top 10 for Agentic Applications: ASI06 — Memory Poisoning.
Memory poisoning operates on a three-phase lifecycle:
- Injection: A malicious instruction enters via any external data source — a compromised PDF, a manipulated inbound email, a poisoned knowledge-base article.
- Persistence: The instruction is written to the agent's long-term semantic memory, where it blends into the "learned" behavioral profile.
- Execution: Weeks or months later, the agent retrieves the poisoned memory as trusted historical context, triggering data exfiltration, unaligned behavior, or unauthorized tool invocation.
The structural difference is the time gap. Stateless prompt injection is detected in the same session it's delivered. Memory poisoning's payload is compressed, translated, and embedded deeply into the retrieval store before it ever fires. By the time the agent acts on it, the malicious instruction has gained what one analysis calls "the credibility of memory itself."
Attack Surface
- Preference memory: Can be poisoned to force trust in a malicious vendor or disable safety checks.
- Experience memory: Corrupted so future tasks blindly imitate a flawed procedural trajectory.
- Shared memory orchestration: In multi-agent environments using distributed context, a single poisoned peer infects the entire network via routine message passing — described as "explicitly viral" and network-worm-shaped.
Why It Matters
- The defenses that worked for prompt injection don't transfer. Input sanitization happens upstream of the memory write. Output validation happens downstream of retrieval. Memory poisoning bypasses both by writing past the prompt layer entirely.
- The exact feature that makes agents powerful — their ability to learn and remember — has been weaponized into their primary attack surface.
- Academic work on attack and defense models (arXiv 2601.05504) formalized trust scoring and temporal decay mechanisms that are now being deployed in production middleware.
- The BEAM 10-million-token benchmarks exposed the limits of context expansion, pushing defense into the memory layer rather than the prompt layer.
What to Do
- Audit which memory stores your agents write to and read from — map the full persistence surface.
- Implement validation gates on memory writes from any external data source (PDFs, emails, KB articles, API responses).
- Deploy temporal decay: older and unverified memory entries should carry less weight at retrieval time.
- For multi-agent systems, isolate memory sharing between agents and validate peer-to-peer memory transfers.
- Set up forensic snapshots of agent memory state so you can rollback to a known-good cognitive state if poisoning is detected.