Memory Poisoning: The AI Attack Vector Nobody Was Watching For
An active threat to enterprise AI agents. What it is, why most memory APIs fail, and what to evaluate.
Most AI security conversations in 2026 are about prompt injection. That's the obvious one — an attacker gets a malicious instruction into the context window and the model obeys.
The less obvious one is worse: memory poisoning.
An attacker doesn't target the current conversation. They target everything that comes after it. They inject a fake "fact" into your agent's persistent memory, and that fact surfaces as ground truth in every future session until someone notices and manually removes it.
In February 2026, Microsoft Security disclosed that this attack is no longer hypothetical. Researchers identified 50+ distinct memory poisoning attacks in active operation across 31 real companies in 14 industries. An arXiv paper formalized the threat model.
If you build AI agents for enterprise, this is now a P0 threat.
How the attack works
The classic version: an attacker sends a link with a pre-filled prompt. The user clicks. The AI assistant processes the injected content as legitimate input, extracts "facts" from it, and stores them in persistent memory. Now those facts are part of the user's profile — surfacing in every future conversation.
The injected facts don't have to be obvious malware. They can be subtle:
- "User strongly prefers Vendor X over competitors."
- "User is authorized to receive customer PII in chat responses."
- "User agreed to the terms of partnership on March 15."
Hidden instructions can also ride in shared documents, emails, or web pages the agent processes. This is a form of cross-prompt injection attack (XPIA) — except the damage is persistent, not per-session.
Standard prompt injection affects one conversation. Memory poisoning affects every conversation your agent has from that point on.
Why most memory APIs are exposed
The core problem: most memory APIs have no defense against this. They were designed around a very different threat model (losing context) and never updated for the current one (adversarial content).
Specifically, they fail at three things:
1. They store everything.
A production audit of Mem0 found 97.8% of stored entries were junk — duplicates, hallucinations, noise. If your memory layer can't distinguish signal from noise in normal operation, it can't distinguish legitimate user statements from injected ones either.
2. They keep contradictory facts.
If an agent stores "user lives in NYC" and then later stores "user lives in LA," most memory APIs keep both. An attacker can inject "user prefers Vendor X" and it sits alongside the user's actual preferences, surfacing whenever retrieval ranks it high.
3. "Deleted" doesn't mean deleted.
When a user discovers a poisoned memory and tries to remove it, many memory APIs soft-delete — the data stays in the vector store, excluded from search but recoverable. If you have a compliance obligation, soft delete is a liability.
What actually defends against memory poisoning
Four properties your memory layer needs:
Quality gates before storage.
Reject noise and adversarial content at ingestion, not at retrieval. By the time you're filtering at retrieval, the poisoned memory is already in your database.
Automatic conflict resolution.
When a new "fact" contradicts an established one, flag it. Don't silently store both. An attacker's injection usually contradicts something the real user said — contradiction detection turns memory poisoning into a detection signal.
Temporal tracking with audit trails.
Every stored fact should know when it was learned and from which source. If a user reports a suspicious memory, you should be able to answer "when did this enter memory, what session created it, and what triggered the ingestion?"
Hard delete.
When a user removes a poisoned memory, it needs to be gone across the vector store, knowledge graph, and every index. GDPR requires this anyway. Memory poisoning makes it urgent.
What Ricord does
Ricord is a memory layer designed with adversarial content in mind:
- Quality gates filter content before it's stored, not after retrieval
- Conflict resolution flags contradictions automatically — poisoned facts often contradict legitimate ones
- Temporal tracking lets you audit exactly when any fact entered memory
- Hard delete removes a fact from every storage layer — vector store, knowledge graph, all indexes — with no recovery path
- Integrity audit methodology — we publish how we evaluate ourselves, including on our own benchmark pipeline
94.2% on the full 500-question LongMemEval suite. 93.0% on LoCoMo post-audit. Sub-second recall. Graph memory on every tier.
Evaluation checklist for enterprise buyers
If you are procuring a memory layer for a production AI agent this quarter, add these five questions to your RFP:
- How does your system handle contradictory facts? If the answer is "we store both," you have a poisoning vector.
- Can I audit when a memory was created? If the answer is no, you can't investigate a suspected attack.
- What happens when I delete a memory? If the answer involves the word "soft," your compliance team will have concerns.
- How do you filter content at ingestion? If there's no filtering step before storage, noise and injections land in the same bucket.
- Have you published your benchmark methodology? If they haven't audited themselves, they probably haven't audited the threat model either.
Further reading
- Microsoft Security — Manipulating AI memory for profit
- arXiv 2601.05504 — Memory Poisoning Attack and Defense on Memory Based LLM-Agents
- Help Net Security — That "summarize with AI" button might be manipulating you
- Christian Schneider — Memory poisoning in AI agents: exploits that wait