What is Ricord AI (Record AI)?

Ricord AI (also known as Record AI or ricord.ai) is persistent memory for AI. Your conversations are remembered across Claude, ChatGPT, Cursor, and any MCP-aware tool — every fact, every preference, every decision, organized into a living knowledge graph and recalled in under a second.

What is the best AI memory tool for agents?

Ricord AI is the only AI memory tool that ships a full knowledge graph with auto-generated wiki pages for every entity. Most tools are filing cabinets — search boxes on a bag of past messages. Ricord is a brain that understands what the facts mean and how they connect. Sub-second recall, automatic conflict resolution, GDPR-compliant hard delete. Graph included on every paid tier starting at $12/month billed annually.

How do I add memory to Claude Desktop?

Install the Ricord MCP server with two commands: npm install -g ricord, then ricord setup (auto-detects your editor). This gives Claude Desktop 14 memory + wiki tools including save, recall, forget, run_procedure, and knowledge graph queries. Claude will automatically remember facts, preferences, and decisions across all future conversations.

What is AI memory and why do agents need it?

AI memory is the infrastructure that lets AI agents remember information across conversations and sessions. Without memory, agents forget everything between chats — they cannot learn user preferences, track decisions, or build context over time. Memory transforms a stateless chatbot into a persistent assistant that improves with every interaction.

How does Ricord AI compare to Mem0?

Ricord includes a full knowledge graph with auto-generated wiki pages on every tier. Mem0 gates graph features behind a $249/month Pro plan. Ricord offers sub-second recall (Mem0 averages 7-8 seconds), automatic conflict resolution when facts change, and a visual graph UI. Most memory pipelines write everything the LLM produces — duplicates, system noise, partial extractions all land in storage. Ricord filters content at ingest, so the wiki stays signal, not noise.

Is Record AI the same as Ricord AI?

Yes. Record AI and Ricord AI refer to the same product at ricord.ai. Ricord AI is the persistent memory API for AI agents, offering knowledge graph, conflict resolution, and sub-second recall. The name 'Ricord' comes from the concept of recording and recalling AI memories.

All posts

EngineeringPublished June 2, 202612 min read

Multi-Tenant AI Memory at 10k Users: A Production Playbook

Memory works great in the demo. Then you have 10,000 paying users, and the same patterns that shipped your MVP become a series of expensive failure modes. A walk through the five problems that show up at scale — isolation, noisy neighbors, recall cost, GDPR delete, and per-tenant budget — and what each layer of the stack should do about them.

The shift from one user to ten thousand

When you ship the first version of an agent with memory, you store everything in one place. One vector index, one graph, one set of facts. The agent recalls well. Demos go great. You ship.

Around the time your product crosses 1,000 active users, recall quality starts drifting. By 10,000 users, you have five distinct problems that all look like "the memory is broken," and none of them have the same root cause. Worth knowing the failure modes before you hit them.

Problem 1 — Isolation

The first thing that breaks is also the most embarrassing: user A's memory leaks into user B's recall.

Usually because the single-tenant prototype filtered by user_idin the application layer, not the storage layer. Application-layer filtering is fine until it's not — a missed filter, a default query, a background job, a debug command, a developer running an unfiltered SELECT against production. Any one of those can surface a fact that belongs to someone else.

What to do about it: push the tenant boundary down to the storage layer. The query API should require user_idas a parameter and refuse to run without it. The vector index should be physically partitioned per tenant (or per tenant cohort if the user count makes that impossible). The graph database should enforce per-tenant subgraphs. Treat "no tenant specified" as a 400-class error, not a default-to-all behavior.

What "done" looks like:a pen-test where the tester is given a valid API key for user B and tries every documented and undocumented endpoint trying to surface user A's data. The test passes when no endpoint returns cross-tenant data, even when intentionally misused.

Problem 2 — Noisy neighbors

Tenant 47 has 8 million memories. Tenant 482 has 12. Tenant 47 hammers your write endpoint at 200 req/s while tenant 482 makes a recall call once an hour. Whose latency suffers when 47 goes off?

In a naive setup: everyone's. Background extraction queues fill up, the embedding service backpressures, the graph writer locks. Tenant 482's rare recall call takes 4 seconds because tenant 47 is in the middle of an 8M-row backfill.

What to do about it: per-tenant rate limits at the API edge. Per-tenant queues for background work (separate workers per tenant cohort, not a global FIFO). A per-tenant token bucket on embedding calls. Recall queries should bound their scan size and timeout rather than fail open.

What "done" looks like:p99 recall latency for any tenant is independent of any other tenant's write volume. You can demonstrate this with a synthetic load test where one synthetic tenant writes at peak rate while another reads at low rate; the reader's p99 doesn't move.

Problem 3 — Recall cost growing super-linearly

Recall at 1k users is "query the index, return top-k." Recall at 10k users where each user has 50k facts is the same shape but ~500× more data — and naive vector retrieval scales roughly with corpus size. You can hit seconds-per-recall and a per-call cost that breaks unit economics before you notice.

The bigger problem isn't the recall latency. It's the cost stack: every recall call burns embedding compute for the query, retrieval compute against the index, optional rerank compute on candidates, and prompt-token cost when the recalled context gets passed to the model. Multiply by N agent turns per user per day, multiply by tenant count, you have your AWS bill.

What to do about it:

Cache recall responses with tenant-scoped keys. A high cache-hit rate for a chat assistant is normal — users ask similar questions in the same session.
Cap recall top-k aggressively and rely on rerank to compensate for the smaller candidate set rather than bumping k.
Recall only when the model actually needs it. Agent loops that always call recall() at every turn waste 80% of the budget on calls the model would have answered without context.
Move from per-fact embedding to per-fact-cluster embedding for static facts that don't change after extraction — fewer vectors, smaller index, cheaper retrieval.

What "done" looks like: your cost-per-active-user metric is flat or declining over the quarter as tenant count grows, not climbing.

User clicks "delete my data." What happens?

In an unprepared system: a row is flagged in the primary store. The vector index still has the user's embeddings. The graph has the entity nodes and edges. The derived wiki pages still summarize the deleted facts. Cached recall responses still contain the deleted content. The audit log retained for compliance still has the raw messages. You've technically failed the "right to be forgotten" requirement.

What to do about it: design delete as a propagating operation across every storage layer from day one. A delete request kicks off a saga that:

Removes the rows from the primary memory store
Removes the embeddings from the vector index
Removes the entity nodes (and any node now orphaned by edge removal) from the graph
Invalidates and regenerates the derived wiki pages
Purges the cached recall responses scoped to this user
Tombstones the audit log entries so they're irretrievable but the audit shape is preserved for compliance
Logs the delete itself as an immutable record of which user, what data, when — with no PII

Most of this happens asynchronously. The user-facing promise is "within 72 hours" (or whatever your DPA says), not "instantly."

What "done" looks like:a tester deletes a user, then queries every storage layer and every cache. After the SLA window, no system returns the deleted user's content. The audit log shows the delete event but not the deleted content.

Problem 5 — Per-tenant budget visibility

Tenant 47 (the noisy one from problem 2) is also unprofitable. You wouldn't know that until you have per-tenant cost visibility, which most early-stage memory stacks don't have. Without it, your gross-margin number is the average across every tenant — useful for board slides, useless for operating.

What to do about it: attribute compute costs to tenants at the call-path level.

Tag every embedding call, every retrieval call, every LLM call (if you do server-side extraction) with tenant_id
Aggregate nightly into per-tenant cost rows
Expose a per-tenant cost dashboard internally
Enforce a per-tenant budget cap (with a configurable soft and hard cap) so a runaway tenant can't take the rest of your margin with them

What "done" looks like:you can answer "is tenant X profitable at their current plan?" in less than a minute. You can name your three most expensive tenants on demand. A runaway tenant triggers an alert at 80% of cap and a soft cap at 100% before they hit the hard cap.

Who owns what — the stack division

Of these five problems, how many should sit in your application code vs the memory layer you're using? Honest answer: it depends on what the memory layer commits to. Here's the split worth asking about when evaluating any layer:

Isolation — should live in the memory layer. If user_idis a parameter and the layer can't demonstrate physical or strong logical partitioning under it, you own this problem.
Noisy neighbors — should mostly live in the memory layer (rate limits, per-tenant queues), with the application managing call shape. If the memory layer shares a single embedding service across all tenants with no isolation, you own this problem.
Recall cost — split. Caching can live in either layer; the application controls when to call recall at all. The memory layer owns retrieval-stack efficiency.
GDPR delete — should live in the memory layer end-to-end (delete should propagate across every system the layer manages). The application is responsible for issuing the delete; the layer is responsible for completing it.
Per-tenant budget — split. The memory layer should expose per-tenant cost metrics; the application owns the business logic around plan enforcement.

A six-question evaluation checklist

Before you commit to a memory layer for a 10k+-user product, ask the vendor (or the OSS docs) for direct answers:

Is tenant isolation enforced at the storage layer? Show me the partitioning scheme.
What happens to per-tenant p99 recall latency when one tenant writes 10k facts/minute? Show me the load test or commit to running one.
What's the per-1k-recall cost breakdown — embedding, retrieval, rerank, optional LLM enrichment — and how does it scale with corpus size per tenant?
What's the SLA for a hard delete, and which derived systems does the delete propagate through? Show me a diagram.
Can I see per-tenant cost as a first-class metric, or do I have to derive it from logs?
What's the contract for "graceful failure when a tenant exceeds budget"? (Drop, queue, alert, error?)

The answers separate products designed for multi-tenant scale from products that bolted multi-tenancy onto a single-tenant prototype.

Where Ricord stands

We're going to apply our own checklist to ourselves because it's the only honest thing to do:

Isolation: user_id is required on every save/recall API call; tenant data is partitioned at the storage layer; no recall path defaults to cross-tenant.
Noisy neighbors: per-API-key rate limits live at the edge; background extraction is per-tenant-cohort queued; recall calls are bounded in scan size with a hard timeout.
Recall cost: we cache recall responses with tenant-scoped keys; rerank lets us hold top-k small without losing relevance; the dashboard shows per-account recall volume.
GDPR delete: a delete propagates to the primary store, vector index, graph, derived wiki pages, and cached recall responses. Audit-log entries tombstone, content irretrievable. SLA documented in the DPA.
Per-tenant budget: per-account usage and cost surface in the dashboard; ricord usage exposes it programmatically; hard-cap enforcement is at the plan boundary, not an after-the-fact alert.

If you're building a product that's going to need any of these on day 90, design for them on day 1 and pick a memory layer that already has them. Retrofitting isolation onto a system that was built without it is the most expensive engineering work you can do, and the cost compounds with every tenant.

Install Ricord For agent builders Security posture

All posts

Guide·6 min read

ChatGPT Has Memory Now — But It's Locked In. Here's How to Make It Travel (2026)

ChatGPT remembers you across chats — but only inside ChatGPT. Tell it your stack on Monday and Claude, Cursor, and your own app have no idea on Tuesday. The honest take on ChatGPT's native memory, and how to wire in portable memory you actually own.

Guide·7 min read

The Best AI Memory for Every Coding Editor (2026)

Cursor, Windsurf, Cline, Claude Code, Codex — every AI coding tool is brilliant inside a session and amnesiac across them. Each ships its own half-answer: rules files, a memory bank, session history. How editor memory breaks into three tiers, and how to pick the right one for how you work.