MCP Memory Server: A Developer's Guide
What an MCP memory server actually does, the minimum-viable version (three tools), and what separates a toy from a production one. Plus a 60-second install path that wires Claude, Cursor, and Codex against the same memory.
Why memory belongs at the MCP layer
The Model Context Protocol — MCP — is the cleanest abstraction the AI ecosystem has produced in the last two years. The model gets a menu of tools by name; you bring the tools. Read a file. Run a shell command. Search the web. Each capability is its own server.
Memory belongs in that menu. The model can decide on its own when to recall context ("before I answer this, have I been told something about this codebase?") and when to store something new ("I should remember the user chose Postgres for the new service"). You don't engineer the prompt to fake persistence — the model calls recall the same way it calls read_file.
The benefit isn't just architectural cleanliness. It's client coverage. The MCP memory server you install for Claude Desktop is the same server Cursor talks to, the same one Codex talks to, the same one your custom Python agent talks to. One brain, every AI.
The minimum-viable MCP memory server
Strip the idea to its absolute floor and you get three tools:
remember(fact, scope?)— store a fact, optionally scoped to a project or user.recall(query, scope?)— fetch facts relevant to a query. Returns a ranked list.forget(id)— hard-delete a specific stored fact.
That's the floor. You can build a working memory layer against three tools and a Postgres table, and for a personal-scale agent it's genuinely enough. Vector search the query against stored facts, return top-k. Done.
For about a week.
Where the minimum-viable version breaks
Three failure modes show up in the first month of real use:
1. Recall gets noisy
Vector similarity on 50,000 stored facts returns a lot of "topically similar" chunks that aren't actually answering the question. The agent asks "what database are we using?" and gets six fragments about database performance from a tutorial it read in March. The signal-to-noise ratio falls off a cliff somewhere around the 10,000-fact mark.
2. Facts go stale and you store both versions
You said "we use Postgres" in February. You said "we migrated to SQLite" in April. Both are in the database now. The agent quotes whichever one wins cosine similarity on a given query. You catch it once, correct it, the contradiction now has three stored facts.
3. You can't see what it knows
The memory layer is opaque. You don't know what your agent thinks about your codebase, your customers, your design decisions. There's no "open the dashboard and read it" surface. You only see the memory through the agent's responses, which is exactly when it's too late to fix.
What makes a production-grade MCP memory server
The version that survives past month one has five additional things beyond remember / recall / forget:
1. A knowledge graph layered over the vector store
Entities, relationships, edges. The retrieval walks edges ("the deploy script for the API service") instead of just guessing from cosine similarity. The graph is where you get actual factual recall, not just topical recall.
2. Conflict resolution at ingest time
When a new fact contradicts a stored one, the system detects it, deprecates the old one with a timestamp, and stores the new one as canonical. The agent never has to choose between two conflicting answers because there's only one canonical answer at any given moment, with the history preserved.
3. Browsable wiki pages per entity
Every entity in the graph gets an auto-generated markdown wiki page with backlinks, aliases, and a contradiction history. You can readwhat the memory layer thinks it knows — the same way you'd read an Obsidian vault. Without this surface, you have no way to audit what your agent is actually working from.
4. Sub-second recall
If recalltakes seven seconds, the agent will stop calling it (or you'll stop using the agent). Production memory layers hit sub-second p50 even past hundreds of thousands of facts.
5. Real deletion
forget needs to hard-delete from the vector store, the graph, every index, every cached embedding. Not soft archive. The day a user reports a poisoned memory is the day you wish your deletion model had been the right one.
Useful extras beyond the three-tool floor
Once the production-grade foundations are in, the real value comes from a richer toolset:
correct(id, new_fact)— fix a stored fact in place, preserving the history.get_context(scope?)— pull the agent's standing instructions and preferences (procedural memory) instead of just declarative facts.wiki_recall(query)— return the relevant wiki page rather than fragments. Useful when the agent needs structured background, not a list of facts.wiki_pages_for_file(path)— given a file the agent is editing, return the wiki entities that mention it. Massive context-window saver for coding agents.graph_stats()— let the agent see the shape of the memory (how many entities, how many edges, how many contradictions pending).
A good MCP memory server ships 10–15 tools that compose. A great one keeps the surface area small enough that the model picks the right tool without prompting tricks.
The 60-second install
Whichever MCP memory server you pick (self-host or hosted), install follows the same shape:
# Install bun add -g <memory-server-package> # Wire it into your AI clients <memory-server-package> install # auto-detects Claude Desktop, Claude Code, Cursor, Codex, # Gemini CLI, and any MCP-compatible client # Restart your client # That's it.
The MCP config block lands in ~/.claude/claude_desktop_config.json (or .mcp.json for Claude Code, .cursor/mcp.json for Cursor). Each client re-reads the config on restart. From that point on, the model has memory tools in its menu and decides on its own when to call them.
If you're building your own MCP memory server, start with the official MCP SDK in TypeScript or Python and implement the three-tool floor first. The hard parts come later: the graph layer, the conflict resolver, the wiki generation. Most self-built servers stop at the three-tool floor and rebuild the production-grade work over the following months.
Why MCP-native memory matters now
For the first two years of the LLM era, "memory" meant stuffing as much context as possible into a 200k-token window and praying. The pattern was: keep a chat history in your backend, replay relevant slices on each turn, accept that the model would lose the thread past session boundaries.
MCP changes the shape. The model gets memory the way it gets file-system access — as a first-class capability it can reach for whenever it needs to. You stop engineering the prompt to fake persistence and start engineering the memory layer to be a good citizen. The agent gets simpler. The persistence gets more reliable. And because every MCP-compatible client uses the same server, the memory follows you across tools.
This is the shape persistent AI memory should have taken from the beginning. We just needed the protocol.
Where Ricord fits
We built Ricord as an MCP-native memory server with all five production-grade properties shipped: knowledge graph, conflict resolution at ingest, auto-generated wiki pages per entity, sub-second recall, and real hard delete. 13 tools acrossremember, recall, correct, forget, get_context, wiki_recall, wiki_pages_for_file, graph_stats, and a few more. Drop it in once, every MCP-compatible client gets the same memory.
bun add -g ricord ricord login ricord install # auto-detects Claude Code, Claude Desktop, Codex, Cursor
Restart your client, ask it to remember something, ask again tomorrow. Wikis populate as you work — by week two you can open the dashboard and read what your AI has learned about your codebase, your customers, your projects.
Keep reading
All postsOpen-Source vs Hosted Memory Layers: When Each Wins
Every AI memory layer now ships in two flavors — open-source self-host or hosted SaaS. The right answer depends on five axes, not on ideology. Here's the framework, slot-by-slot.
How to Make Claude Code Remember Across Sessions
Claude Code is brilliant within a session and amnesiac between them. Three patterns that fix it — including an MCP-native memory layer that installs in 60 seconds.