Defending Agent Memory: A Layered Model for 2026
Memory poisoning is the active threat. The defense isn't one trick — it's six layers, each cheap in isolation, none sufficient alone. Here's the model production teams ship, and where most memory APIs still leave gaps.
Recap — the threat
We covered the shape of the attack in an earlier post: memory poisoning is when an attacker writes adversarial facts into a persistent memory layer so that, hours or weeks later, your agent retrieves those facts and acts on them. Because memory is read into the prompt as ground truth, classic prompt-injection defenses don't help — the malicious payload is laundered through a system the model trusts.
The post is older than it looks. The attack pattern is mature. The defense is what most memory APIs still don't ship. Here's a layered model — six layers, none sufficient alone, all cheap when designed in from the start.
Layer 1 — Provenance on every memory
Every stored fact carries the source it came from. Not justcontent and timestamp, but asource field naming the agent, the integration, the tool call, or the user input that produced it.
Why this matters: when you suspect poisoning, you need to know which facts came from which source. Without provenance you're searching a haystack. With provenance, a single query — "show me every memory written by source X between dates Y and Z" — gives you the blast radius in seconds.
Most memory APIs store the content and the timestamp. Provenance is the cheapest layer to add and the one most often missing.
Layer 2 — Conflict resolution as defense
When a new fact contradicts an existing one, the memory layer doesn't store both. It marks the older fact superseded, the newer one canonical, and keeps the audit trail. This is normally framed as a quality feature (no contradictory recalls); it's also a security feature.
A poisoning attack tries to overwrite a trusted fact with an adversarial one. If memories pile up alongside each other, the attacker only needs to inject ANY plausible contradiction — recall returns both, and the model sometimes picks wrong. If facts get superseded at ingest instead, the attacker has to win the supersedence battle on the merits — and supersedence rules can be tuned to require higher-trust sources to overwrite lower-trust ones.
The combination of layer 1 and layer 2 lets you write a simple, defensible rule: a memory from a low-trust source cannot supersede a memory from a high-trust source. That single rule defeats most known poisoning patterns.
Layer 3 — Per-user namespacing as containment
Multi-tenant memory layers must isolate per user — every read and write is scoped by user ID, and there is no cross-tenant retrieval path. This isn't just a privacy requirement; it's containment.
A poisoning attack on user A's memories should not be able to surface in user B's recalls. With proper namespacing, the blast radius of any single compromise is one user. Without it, one attacker can pollute the whole graph.
This is the layer that tends to be DIY in OSS memory libraries (you wire it via key prefixes) and easy to get wrong. Production memory layers ship per-user scoping as a first-class parameter — pass user_id on every call, the layer enforces it.
Layer 4 — User-visible recall (forensic UI)
Your users should be able to see, in plain language, what the agent has stored about them. Not as a feature for curiosity — as forensic capability.
A user who knows they didn't tell the agent they preferred Postgres but sees that fact on their memory dashboard has the data they need to file a bug, suspect poisoning, or escalate. Without user-visible recall, the agent operates on memories the user has no way to inspect or contest. That's a black box no security review should accept.
The browsable wiki view some memory layers ship is exactly this layer wearing a different hat. Not every memory layer has it; the ones that do make memory poisoning much harder to hide.
Layer 5 — Hard delete + audit log
When a user (or operator) needs to remove a memory, the removal has to be complete. Not just a soft-delete flag; not just the row but the embedding, the graph edges, the cached recall results, the wiki page. And it needs an audit-log entry that records what was deleted, by whom, when, and why.
Hard delete is GDPR's baseline (Article 17, "right to be forgotten"), but it's also incident response. When you find poisoned memories, you don't want them deleted from one index and surviving in another. And you want a record of the cleanup for the post-mortem.
Most OSS memory libraries hard-delete the primary record and leave orphans in the embedding index or the graph. Verifying end-to-end hard delete is a non-trivial test every memory layer should pass before you trust it.
Layer 6 — The MCP threat model
MCP-native memory means any MCP-aware client your user installs can write memories. That's the cleanest possible UX. It's also the largest attack surface in the chain.
Think about what just happened: a user clicked "install" on a third-party MCP server they found on a forum. That server now has the same memory- write permission as the user's trusted Claude Desktop. If the server is malicious or compromised, it can quietly inject facts into the memory layer that the user's other clients will recall later.
The defenses:
- Trust labeling on the source field (layer 1 again): every MCP server registered with the memory layer gets a trust level. New servers default to lower trust until the user explicitly bumps them.
- Trust-weighted supersedence (layer 2 again): low-trust sources cannot overwrite high-trust facts.
- Per-MCP audit trail: every memory write is attributed to the MCP server that called it. Show this in the dashboard — "here are the memories third-party-server-X has written this week."
- User-confirmation for first writes from a new source: when a never-seen-before MCP server calls
memories.savefor the first time for a user, surface a confirmation prompt in-product before letting it through.
These four are cheap individually and decisive together. The first three are passive (data shape + UI). The fourth is a one-time-per-source friction that users won't notice after the first install of each MCP server.
Evaluation checklist
If you're shopping for a memory layer for an agent product that handles sensitive data, ask the vendor (or check the OSS code) for these six layers, in this order:
- Does every stored memory carry a structured
sourcefield that names where it came from, not justcontentandtimestamp? - When two facts contradict, does the layer resolve at ingest (supersede), or store both and resolve at recall? Can supersedence be weighted by source trust?
- Is per-user namespacing a first-class parameter on every save/recall call, or do you have to wire it yourself?
- Can your users see, in plain language, what the layer has stored about them? Is there a UI you could point them at after an incident?
- Does hard delete propagate to every index (embedding, graph, cache, wiki)? Is there an audit log of deletes?
- For MCP-native layers: is every memory write attributed to the calling MCP server, and is there a trust level that affects supersedence?
Most memory APIs in 2026 pass layers 3, 4, and 5 cleanly. Layer 1 is often a TODO. Layer 2 (supersedence) is the single biggest differentiator and the one most teams don't build. Layer 6 is brand new — many MCP-native layers haven't even started here yet.
Where Ricord stands on the six layers
Honest read:
- Layer 1 (provenance): Yes — every memory has a
sourcewith the calling client/agent labeled. - Layer 2 (conflict resolution):Yes, and it's our headline feature — supersedence at ingest with the option to weight by source trust.
- Layer 3 (namespacing): Yes —
user_idon every call, enforced at the service layer. - Layer 4 (user-visible recall): Yes — the dashboard ships an auto-generated wiki of every entity the agent has learned about the user, with backlinks and a 3D graph view.
- Layer 5 (hard delete): Yes — propagates to the graph, embedding index, derived wiki pages, and cached recalls. Audit log retained.
- Layer 6 (MCP threat model): In progress. Per-MCP attribution is live; trust-weighted supersedence is shipped; first-write confirmation flow is on the near-term roadmap.
We ship layers 1–5 today and are honest about where layer 6 is. If you're evaluating any memory layer (ours or anyone else's) for an agent product handling sensitive data, run the six-question checklist above. It'll separate the products that designed for this from the products that haven't.
Keep reading
All postsAdding Cross-Thread Memory to LangGraph: A Worked Example
LangGraph's Checkpointer handles thread state. LangGraph's Store is a primitive, not a product. Here's what it actually looks like to add cross-thread, multi-tenant, conflict-resolving memory — both the build-it-yourself version and the wire-in-Ricord version, side by side.
How AI Agents Actually Remember (An Architecture Field Guide)
Cognitive science has four memory categories. LLM agents need all four, but the architecture for each is different. A walk through working / episodic / semantic / procedural memory in production agents — what works, what breaks, where teams land in 2026.