MemPalace: Local-First AI Memory That Actually Works

I lost three days of work because Claude forgot who I was.

Not literally forgot. But functionally, yes. I had spent a week refining a multi-agent pipeline for a document analysis system. The architecture was solid. The prompts were dialed in. Every agent knew its role, its boundaries, its output format. Then I opened a new session, and all of it was gone. The agent had no idea what we had built together. No memory of the decisions, the dead ends, the reasons we chose one approach over another. I was back to explaining everything from scratch.

This is the dirty secret of working with AI agents in 2026. They are stateless by default. Every conversation starts at zero. And the workarounds most of us use, pasting old conversations back in, maintaining sprawling CLAUDE.md files, manually summarizing past sessions, those are duct tape on a structural problem.

MemPalace is the first system I have used that actually fixes this. Not with a cloud service. Not with an API key. Locally, on your machine, with a recall accuracy that beats everything else I have tested.

Table of Contents#

The context loss problem nobody wants to talk about
What MemPalace actually is
The architecture, and why it matters
The compression trick that makes it practical
Benchmark results that surprised me
How it compares to the alternatives
Getting started in five minutes
MCP integration for Claude Code and Cursor
Where this fits in the bigger picture
FAQ

The Context Loss Problem Nobody Wants to Talk About#

If you have built anything with multi-agent systems, you already know the pain. Agents are excellent at executing within a single session. They can research, analyze, draft, review. They can coordinate with other agents. They can loop and self-correct. But the moment that session ends, the slate is wiped clean.

I wrote about the five pillars of agentic engineering a few weeks ago, and one of the pillars, context engineering, is fundamentally about this problem. The insight from that post was that if domain knowledge does not exist somewhere the agent can read, it does not exist for the agent at all. CLAUDE.md files, skills, MCPs, all of those are mechanisms for making knowledge available to agents. But they are static. They do not grow. They do not learn from your sessions.

What I wanted was something different. A memory layer that grows over time. One where the agent remembers not just facts, but the trajectory of decisions. Why we chose LangGraph over CrewAI for a particular project. Why we switched from REST to WebSocket for a real-time pipeline. Why a particular prompt template works better than the one that looks more logical on paper.

That is what MemPalace does. And it does it without sending a single byte to the cloud.

What MemPalace Actually Is#

MemPalace is an open-source, local-first AI memory system. MIT licensed, Python 3.9+, currently at version 3.0.0 with around 12,000 stars on GitHub. The name comes from the ancient Greek memory technique where you mentally place items in rooms of an imagined building and then walk through the building to recall them. The software takes that metaphor literally.

Instead of dumping all your memories into a flat vector store and hoping semantic search finds the right one, MemPalace organizes memories into a spatial hierarchy. Think of it like a building with wings, rooms, halls, tunnels, closets, and drawers. Each level in the hierarchy serves a different purpose. Memories are not just stored. They are placed in context.

The whole thing runs on ChromaDB for vector search and SQLite for structured metadata. No external services. No API keys for the base system. Everything lives on your machine.

That last point matters more than it might seem. If you are working with proprietary code, internal architecture decisions, or anything covered by an NDA, sending conversation history to a cloud memory service is a non-starter. MemPalace sidesteps that entire problem.

The Architecture, and Why It Matters#

This is where MemPalace gets interesting, and where the method of ancient memory palace starts to make real engineering sense.

The hierarchy works like this:

Wings are the top-level categories. Think of them as departments in a building. You might have a wing for each major project, or for different domains like "frontend architecture" or "infrastructure" or "product decisions." Wings provide the broadest organizational layer.

Rooms sit inside wings and represent specific topics. Inside your "frontend architecture" wing, you might have rooms for "component library," "state management," "performance optimization," and "design system."

Halls live inside rooms and separate memories by type. There are three kinds: facts (things that are true), events (things that happened), and advice (lessons learned, recommendations, patterns to follow or avoid). This type separation is subtle but powerful. When an agent queries for "what do I know about our caching strategy," it can prioritize facts for technical details, events for the history of changes, and advice for the lessons we learned along the way.

Tunnels are cross-wing links. This is where the architecture starts to outperform flat vector search in meaningful ways. A memory about a caching bug in the infrastructure wing can be linked to the performance optimization room in the frontend wing. These explicit connections mean the system can surface related knowledge that pure semantic similarity would miss.

Closets hold compressed summaries. This is where MemPalace stores the condensed version of long histories, making it possible to load months of context without consuming your entire token budget.

Drawers contain verbatim artifacts. Code snippets, exact error messages, configuration files, specific outputs that need to be preserved character for character.

The reason this hierarchical approach works better than flat vector search is straightforward. When you search a flat vector store, you are asking: "What memories are semantically similar to this query?" That is fine for simple lookups. But it falls apart for complex recall tasks where the answer depends on relationships between memories, temporal ordering, or the type of knowledge you need. MemPalace's benchmark results make this concrete. The structured palace approach achieves 94.8% recall at 10 results, compared to 60.9% for flat vector search. That is a 34 percentage point improvement just from how you organize the memories.

The Compression Trick That Makes It Practical#

Here is the problem with agent memory that I did not appreciate until I tried to build my own version. Storing memories is easy. Making them fit in a context window is hard.

A month of active development generates a lot of conversational history. Tens of thousands of tokens worth of decisions, debugging sessions, architectural discussions, code reviews. You cannot load all of that into a context window. Even with the 200K token windows we have today, you would burn most of your budget on history and leave almost nothing for the actual task.

MemPalace solves this with something called AAAK, a lossless compression dialect. The name stands for Agent-to-Agent Abbreviated Knowledge. It is essentially a compressed notation system designed specifically for agent-to-agent communication. The key claim, and I have verified this in my own usage, is 30x compression. Months of context load in 120 to 170 tokens.

That number sounds implausible until you think about what most conversational history actually contains. Greetings, repetitions, restated questions, verbose explanations of things the agent already understands. AAAK strips all of that out and preserves only the information-bearing content in a format that agents can parse efficiently.

The practical impact is significant. With 120 tokens of compressed context, an agent can understand the full trajectory of a multi-week project. What was tried. What failed. What worked. What the current approach is and why. That is the difference between starting every session from zero and starting every session from where you left off.

Benchmark Results That Surprised Me#

I am usually skeptical of benchmark claims. Too many tools in the AI space publish numbers that look great on paper but do not survive contact with real workloads. MemPalace's numbers are interesting because they span multiple benchmarks and because the zero-API configuration performs competitively with systems that require cloud services.

The headline number: 96.6% recall on LongMemEval with zero API keys. That is the highest published zero-API score I have found. When you add Haiku reranking, which does require an API call, it hits 100%. MemPalace is reportedly the first system to achieve a perfect score on LongMemEval.

On the LoCoMo benchmark, the honest score is 88.9% recall at 10 results. I say "honest score" because some systems report numbers on LoCoMo using configurations that would not be practical in production. MemPalace's number comes from the default configuration.

The MemBench benchmark, which was published at ACL 2025, pushes things further with 80.3% recall at 5 results across 8,500 items. That is a stress test. 8,500 individual memories is a lot of context, and maintaining 80%+ recall at that scale with only 5 results returned is impressive.

But the number that matters most for practical use is the 34% improvement from structured palace versus flat vector search. That is the architectural insight. You can use the same embedding model, the same similarity metric, the same hardware. Just organizing the memories into the palace hierarchy, with wings and rooms and halls, gives you a third more recall. It is a structural advantage, not a model advantage.

How It Compares to the Alternatives#

The AI memory space is getting crowded, and the options differ in fundamental ways. Here is how I think about the landscape.

Zep Graphiti is the enterprise-grade option. It builds a temporal knowledge graph on top of Neo4j, which means you get powerful graph queries and relationship traversal. The tradeoff is infrastructure complexity. You need to run Neo4j, which is not trivial to operate in production, and the cloud-hosted option means your memories leave your machine. If you are building a product that serves thousands of users and need a scalable memory backend, Graphiti makes sense. For individual developer workflows, it is overkill.

MemGPT, now called Letta, takes a different philosophical approach. It treats memory management as a task for the LLM itself. The model decides what to remember, what to forget, and how to organize its own memories. This is elegant in theory. In practice, I found that LLM-managed memory introduces a layer of unpredictability. The model sometimes forgets things you want it to remember and remembers things you would rather it forgot. It also requires API calls for the memory management operations, which adds latency and cost.

Mem0 focuses on RAG-style fact extraction. It watches your conversations, extracts key facts, and stores them for later retrieval. The extraction step is the bottleneck. Any fact the extraction model misses is gone. MemPalace takes the opposite approach: store everything verbatim, then make it findable. This is a meaningful philosophical difference. Extraction-based systems are only as good as their extraction. Verbatim storage with good search does not have that ceiling.

LangChain Memory is the managed cloud option. It integrates neatly with the LangChain ecosystem, which is attractive if you are already building on LangChain. The tradeoff is vendor lock-in and data leaving your machine. For prototyping and non-sensitive workloads, it is convenient. For anything involving proprietary code or architectural decisions, the local-first alternative wins.

MemPalace's core differentiator is that it combines local-first operation with benchmark-leading recall. You do not have to choose between privacy and performance. That is not a small thing.

Getting Started in Five Minutes#

Setup is straightforward. MemPalace is a Python package, and the base installation requires nothing beyond pip.

pip install mempalace

That is it for the base system. No Docker, no database server, no API keys. ChromaDB and SQLite are embedded dependencies.

To initialize a palace for a project:

mempalace init --name "my-project"

This creates the palace structure with a default set of wings. You can customize the wings to match your project's domains, but the defaults are reasonable for most development workflows.

Mining is how MemPalace ingests existing knowledge. There are three modes.

Project mining scans your codebase, documentation, and configuration files:

mempalace mine projects --path /path/to/your/project

Conversation mining processes chat transcripts from Claude, ChatGPT, and other tools. It supports five chat export formats:

mempalace mine convos --path /path/to/exported/chats

General mining auto-classifies mixed content:

mempalace mine general --path /path/to/mixed/content

After mining, you can query the palace directly:

mempalace recall "What was our caching strategy decision?"

Or you can query with temporal context, which is one of my favorite features:

mempalace recall "What did we know about the auth system as of March 15?"

That as-of query is not just filtering by date. MemPalace maintains a temporal knowledge graph, so it returns what was believed to be true at that point in time, even if it was later contradicted by newer information. This is extremely useful for understanding why certain decisions were made.

MCP Integration for Claude Code and Cursor#

The MCP server is where MemPalace becomes genuinely transformative for daily development workflows. MemPalace ships an MCP server with 19 tools that work with Claude Code, ChatGPT, Cursor, and any other MCP-compatible client.

To set it up for Claude Code, add this to your MCP configuration:

json code-highlight

{
  "mcpServers": {
    "mempalace": {
      "command": "mempalace",
      "args": ["mcp", "--palace", "my-project"]
    }
  }
}

Once connected, your agent can read from and write to the palace during every session. The 19 tools cover the full lifecycle: storing memories, recalling them, querying the temporal graph, generating timelines, detecting contradictions, and managing the palace structure.

The auto-save hooks for Claude Code are particularly useful. You can configure MemPalace to automatically mine every Claude Code session when it ends. This means your palace grows passively. You do not have to remember to export and import conversations. Every decision, every debugging session, every architectural discussion gets captured automatically.

Here is what the configuration looks like for auto-save:

json code-highlight

{
  "hooks": {
    "postSession": {
      "command": "mempalace mine convos --format claude-code --path ${SESSION_LOG}"
    }
  }
}

With this in place, you start your next session and the agent already knows what happened in the last one. It knows the project's history. It knows the decisions and the reasons behind them. That feeling of "starting from zero" just goes away.

Specialist Agent Wings and Diaries#

One feature I did not expect to matter as much as it does is the ability for specialist agents to maintain their own wings and diaries in AAAK format.

If you are running a multi-agent workflow where different agents handle different domains, something like the Scion orchestration model where agents run in parallel containers,, each agent can have its own wing in the palace. A security review agent maintains a wing with findings, patterns, known vulnerabilities. A code review agent maintains a wing with style decisions, recurring issues, project-specific conventions. An architecture agent maintains a wing with system design decisions, component boundaries, integration patterns.

Each agent can also keep a diary in AAAK compressed format. The diary is a running log of the agent's activities and decisions, compressed to around 5 tokens per entry. When the agent starts a new session, it loads its diary to understand its own history. This turns stateless agents into agents with continuity.

The practical impact of this is hard to overstate. A code review agent that remembers every review it has done on your project catches patterns that a stateless agent never would. "You introduced this same type error in the auth module three weeks ago. The fix was to add a type guard at the service boundary." That kind of contextual feedback is only possible with persistent memory.

Temporal Knowledge Graph#

The temporal knowledge graph deserves its own section because it enables capabilities that flat memory stores simply cannot replicate.

Every memory in MemPalace has a timestamp and a validity range. When you store a fact like "we are using Redis for session storage," that fact is recorded with a creation time. If you later store "we migrated session storage to DynamoDB," MemPalace does not delete the Redis fact. It marks it as superseded and records the transition.

This means you can ask questions like:

"What was our session storage strategy in January?" (Redis)
"When did we switch session storage?" (Shows the timeline)
"What contradictions exist in our infrastructure knowledge?" (Surfaces the transition)

Timeline generation is a built-in query type. You can ask MemPalace to generate a timeline of a specific topic, and it will show you the evolution of knowledge over time. For long-running projects, this is invaluable. I have used it to onboard new team members by generating a timeline of architectural decisions for the past six months.

Contradiction detection is the other temporal feature that has saved me real debugging time. When new information contradicts existing memories, MemPalace flags it. This catches cases where an agent stores something that conflicts with what was previously established, which can happen when an agent hallucinates or when a team member provides outdated information.

For a visual breakdown of the full MemPalace architecture, benchmarks, and the AAAK compression system, see the MemPalace infographic series.

Where This Fits in the Bigger Picture#

I think about MemPalace as a missing piece in the agentic engineering stack. We have gotten pretty good at the execution layer. Agents can write code, run tests, make API calls, analyze data. We have gotten decent at the orchestration layer with tools like LangGraph and frameworks for multi-agent coordination. But the memory layer has been the weak link.

The reason memory matters so much is that it is the foundation for everything else. Context engineering, which I called the most important pillar of agentic engineering, is ultimately about what the agent knows. This is why harness engineering focuses so heavily on documentation structure and feedback loops. If the agent's knowledge resets every session, your context engineering is limited to what you can manually maintain in static files. That is a ceiling.

MemPalace removes that ceiling. Not by replacing CLAUDE.md files or skills or any of the other context engineering tools. It complements them. Your CLAUDE.md describes how the agent should behave. Skills describe how to perform specific tasks. MemPalace remembers what happened. Those are different kinds of knowledge, and they work together.

The local-first philosophy is also important for a reason that goes beyond privacy. When your memory system runs locally, you control the latency. Every recall query is a local database lookup, not a network round trip. In a multi-agent workflow where agents are querying memory dozens of times per session, that latency difference adds up. I have seen 200-300ms per query to cloud memory services. MemPalace's local queries complete in single-digit milliseconds.

For teams, the palace structure can be shared via git. The SQLite database and ChromaDB collection are files on disk. You can commit them to a repository, and every team member gets the same shared memory. This is a simple but effective approach to team knowledge management that does not require running a server.

What I Would Watch For#

MemPalace is not perfect. The mining process for large codebases can be slow, and the initial setup requires some thought about how to structure your wings. If you get the wing structure wrong, recall suffers. I spent an afternoon reorganizing a palace after realizing that my initial structure was too granular, with too many small rooms that fragmented related memories.

The AAAK compression format is also opaque. You cannot easily read the compressed memories as a human, which means you have to trust the system's recall capabilities. For debugging, there is a decompression tool, but it adds a step.

And while the zero-API configuration is impressive, the reranking step that pushes recall to 100% does require an API call to Haiku. For truly air-gapped environments, you are working with the 96.6% base recall, which is still excellent but not perfect.

These are small issues relative to the value. But they are worth knowing about before you invest time in setting up a palace.

FAQ#

Does MemPalace require any API keys or cloud services?#

No. The base system runs entirely locally on ChromaDB and SQLite. Zero API keys are needed for the core functionality, including mining, storage, recall, and temporal queries. The only feature that requires an API key is the optional Haiku reranking step, which improves recall from 96.6% to 100%. You can use MemPalace indefinitely without ever configuring an API key.

How does MemPalace compare to just using a large context window?#

Large context windows solve a different problem. They let you fit more information into a single prompt. MemPalace solves the persistence problem, making information available across sessions. The 30x compression via AAAK means you can load months of project context in 120 to 170 tokens, which leaves the rest of your context window free for the actual task. Even with a 200K token window, you cannot fit a month of raw conversation history. With MemPalace, you can fit six months of compressed context and still have 199,800 tokens to work with.

Can I use MemPalace with ChatGPT, Cursor, or other tools besides Claude?#

Yes. The MCP server exposes 19 tools that work with any MCP-compatible client. This includes Claude Code, ChatGPT (with MCP plugin support), Cursor, and other editors that support the Model Context Protocol. The mining tools also support five conversation export formats, so you can import history from multiple tools into the same palace.

What happens if I store contradictory information?#

MemPalace's temporal knowledge graph handles contradictions explicitly. When new information contradicts existing memories, the system does not delete the old information. It marks it as superseded and records the transition with timestamps. You can query the contradiction detection tool to surface all conflicts, and you can use as-of queries to retrieve what was believed to be true at any point in time.

How much disk space does a typical palace use?#

This depends on how much you mine, but the storage is efficient. A palace with several months of active development history, including mined codebases and conversation transcripts, typically uses a few hundred megabytes. The ChromaDB vector store is the largest component. SQLite metadata is negligible. The AAAK compressed summaries are tiny by design.

Yes. The palace is stored as files on disk, specifically a SQLite database and a ChromaDB collection directory. You can commit these to a git repository and share them across a team. Each team member gets the full shared memory. For larger teams, you can also structure the palace with per-person wings so that individual context does not clutter the shared knowledge base.

Is MemPalace production-ready for enterprise use?#

MemPalace is at version 3.0.0 with 12,000 GitHub stars, which suggests a mature and active community. The MIT license is enterprise-friendly. The local-first architecture avoids the data governance concerns that come with cloud memory services. That said, it is a developer tool, not an enterprise platform. There is no admin dashboard, no role-based access control, no audit logging. For individual developers and small teams, it is production-ready. For enterprise deployment with compliance requirements, you would need to build those layers yourself or wait for the ecosystem to mature.