When I first read through the Redis Iris architecture, the detail that surprised me most was not the CDC pipeline or the vector search. It was a single constraint that the whole system is built around: the agent never touches your operational systems directly. Not the database. Not the OMS. Not the CRM. The agent talks to Redis, and only Redis.
That sounds simple. It is not a trivial constraint to design around. Everything else in the architecture, the change data capture pipeline, the generated tools, the memory tiers, the semantic cache, is downstream of that one decision. Understanding why that constraint exists is the right entry point for understanding how the system works.
In Part 1 of this series, I covered the four failure modes of naive RAG in production: stale state, slow retrieval, fragmented memory, and disconnected tools. Redis Iris is a direct answer to all four. This post is about how the answer is built.
This Post is Part of a Series#
The Context Layer Problem is a 4-part series on why retrieval fails in production AI systems and what to do about it.
- Part 1: Why Your RAG Pipeline Fails in Production — the 4 runtime failure modes
- Part 2: How Redis Iris Actually Works — RDI, Context Retriever, Memory, LangCache
- Part 3: Redis Iris vs. Pinecone Nexus vs. Naive RAG — decision framework
- Part 4: Should You Actually Use Redis Iris? — honest builder verdict
Table of Contents#
- Why the agent should not query the database directly
- Component 1: Redis Data Integration
- Component 2: Context Retriever
- Component 3: Agent Memory
- Component 4: LangCache
- Redis Flex and the cost question
- What this architecture actually assumes
- FAQ
Why the Agent Should Not Query the Database Directly#
Before getting into the components, it is worth being explicit about why the "agent reads from Redis only" constraint exists. There are two reasons and both are practical rather than theoretical.
The first is query volume. Agents at production scale generate query patterns that no transactional database was built to handle. A human analyst might run fifty queries against a Postgres instance over a full workday. An agentic pipeline running product recommendations, order lookups, and customer context enrichment in parallel across many concurrent sessions can generate that volume in seconds. The OLTP database is sized for its application workload. It is not sized to also be the inference substrate for agents. I have watched teams ship agent prototypes that work beautifully at low traffic and then watch latency collapse at production scale because every agent hop was hitting the production database. The separation is not premature optimization. It is load management.
The second reason is schema shape. A normalized relational schema is correct for transactional integrity. It is a bad fit for agent-time reads. When an agent needs a flat representation of "what did this customer order last quarter that is currently out of stock," it has to join across orders, line items, products, and customer tables, resolve foreign keys, apply filters, and reconstruct a business object, all under latency pressure. The agent does not need normalized correctness. It needs a denormalized, pre-indexed copy shaped for the access patterns it actually uses. Redis Iris is built on the premise that you maintain a Redis copy of your data shaped specifically for agent reads, separate from the write-optimized relational schema in your source system.
With that context, the four components make more sense.
Component 1: Redis Data Integration#
Redis Data Integration, or RDI, is the pipeline that keeps the Redis copy of your data synchronized with your source databases. It implements change data capture, connecting to source systems like Postgres, Oracle, MongoDB, and Snowflake and tracking change events at the transaction log level. Those events stream continuously into Redis data structures.
The distinction between CDC and batch synchronization matters here and it is worth being precise about it.
A batch job that syncs your database to Redis every fifteen minutes gives you a copy that is always up to fifteen minutes stale. That is the best case. If the batch job takes three minutes to run, and it runs every fifteen minutes, your worst case is an eighteen-minute lag. For order status, inventory levels, and ticket state in an active system, that lag is not acceptable.
CDC tracks individual change events as they are committed to the source system. An order status update propagates into the Redis layer in seconds, not on the next job run. A price change, an inventory decrement, a ticket status transition, these all flow through as they happen. This is what makes the "always fresh" claim in the Redis Iris marketing actually credible rather than aspirational.
The copy that lands in Redis is not a mirror of the relational schema. It is denormalized and shaped for fast reads. Flattened business objects, pre-computed relationships, indexed fields. You are trading write consistency guarantees for read performance. The agent does not need ACID semantics for a product catalog read. It needs the read to be fast and the data to reflect current reality.
I want to be honest about what is novel here and what is not. Copying operational data to a read-optimized layer is not a new idea. Analytics engineers have been doing it with data warehouses for decades. Application developers have used Redis as a cache in front of Postgres since Redis existed. What RDI adds is tooling designed around the MCP surface that agents consume, rather than the SQL surface that analysts use. The CDC mechanism itself is standard. The application layer on top of it, the entity modeling, the tool generation, the agent-aware access patterns, is what is being productized.
Component 2: Context Retriever#
Context Retriever is where the architecture becomes concretely useful to an agent builder.
You define entity models: the business objects your agent needs to reason about, their fields, and their relationships. A Product entity might include fields for ID, name, price, tags, stock count, and a relationship to its Category. An Order entity might include customer ID, line items, status, fulfillment events, and relationships to the Customer and to any open support tickets. You write these models once, upfront.
Context Retriever reads those model definitions and automatically generates MCP tools and CLI tools from them. For a Product entity, you get tools like find_product_by_price_range, filter_products_in_stock, and filter_by_tags. For a Customer entity, get_customer_by_id and search_customers_by_text. The agent calls these tools by name with typed parameters. It never writes a raw query. It never constructs a join. It never needs to know whether the underlying Redis data structure is a Hash, a Sorted Set, or a vector index.
The joins are resolved at model definition time, not at agent execution time. This is the crucial difference from having the agent query multiple sources and join in its reasoning context. If a Customer has a relationship to their recent Orders, that relationship is encoded in the entity model, and Context Retriever materializes it so the agent can retrieve a customer-with-recent-orders object in a single tool call. Role-level access control is also defined at the entity level. You specify which roles can read which entities, and the tool layer enforces it before the query reaches the data.
This component directly addresses two of the failure modes I covered in Part 1. Disconnected tools: the agent now has a single, typed tool surface rather than ad-hoc calls to heterogeneous systems. Slow retrieval: the joins are pre-resolved, so multi-hop chains that were assembling a joined view at agent runtime are replaced by a single tool call against a pre-indexed entity.
If you think about the patterns in agentic LLM workflow patterns, the Context Retriever is essentially a well-engineered tool layer. The agent still calls tools. But the tools are generated from a coherent entity model rather than hand-rolled against individual system APIs.
Component 3: Agent Memory#
Redis Iris organizes agent memory into two tiers with different purposes and different persistence characteristics.
Short-term memory stores current conversation state and session history. It is TTL-based. You configure how long session state persists before it is cleared. When the TTL expires, the session context is gone unless it was explicitly promoted to long-term memory. In fast-moving data environments, this is a feature rather than a limitation. A shopping session from four days ago should not inform today's product recommendations without careful qualification. Stale session context can be as harmful as stale retrieval data.
Long-term memory stores user preferences, learned behavioral patterns, and extracted insights promoted from short-term sessions. It persists across sessions until explicitly evicted. In practice this means defining a retention policy upfront: which patterns are worth keeping and which just accumulate noise over time. An agent that learned in a prior session that a customer prefers metric units, always filters by in-stock items, and works in an enterprise procurement context does not start the next session cold. It starts with that accumulated context and can apply it immediately.
The positioning here relative to other memory solutions is worth noting. mem0 is a dedicated memory layer that sits on top of your stack as a separate service. OpenAI's memory is platform-scoped to their API. Anthropic's is per-session within their API context. Redis Iris memory is co-located with the operational data in the same Redis instance. That co-location simplifies the retrieval path. When the agent needs context from prior sessions, it is making a call to the same system it uses for operational data retrieval, not making a cross-service call to a separate memory store.
The episodic, semantic, and procedural memory architecture framework maps roughly onto the Redis Iris memory tiers. Short-term session memory is episodic. Long-term stored patterns and preferences are closer to semantic memory. The TTL-based promotion mechanism is what moves insights from episodic to semantic storage.
Component 4: LangCache#
LangCache is a semantic cache for LLM responses. Before routing a query to the model, it checks whether a semantically similar query has already been answered and, if the similarity score exceeds a configured threshold, returns the cached response instead of generating a new one. It supports exact string matching for identical queries and semantic similarity search using vector embeddings for queries that are different in phrasing but equivalent in meaning.
For high-volume pipelines where queries are repetitive, this genuinely matters. Customer support chatbots field the same questions constantly. Product FAQ agents answer the same queries about return policies and shipping timelines hundreds of times a day. For those patterns, LangCache can cut LLM call volume substantially and reduce latency on cache hits to single-digit milliseconds.
I want to be direct about the risk here rather than footnoting it, because I think it is the component where teams are most likely to get burned.
A semantically similar query about product stock answered two hours ago is wrong if inventory changed in the last two hours. The semantic similarity check does not know about your data's volatility. Set the threshold too permissively and you serve stale answers with misplaced confidence. This is the stale state failure mode from Part 1, except now the agent is not just reading old data. It is confidently returning a cached answer synthesized from old data. The failure is harder to detect because it looks like a fast, successful response.
This is not a hypothetical risk. It is the exact type of subtle failure that erodes user trust in production agents, because the system is confidently wrong rather than visibly uncertain. Before putting LangCache in a production path for data that changes frequently, you need explicit staleness policies tied to your data's TTL characteristics. The similarity threshold needs to be tuned against your actual query distribution, not a benchmark. And you need monitoring to catch cases where a high-confidence cache hit was actually serving stale synthesized content. None of this is a default configuration. It is work you do before going live.
Redis Flex and the Cost Question#
Redis Flex is a storage tier that keeps hot data in RAM and moves colder data to SSD rather than requiring everything to live in memory. For builders running large vector indexes, this changes the cost calculus significantly.
A vector index over millions of embeddings requires substantial memory if everything lives in RAM. At a billion vectors, the in-memory cost is prohibitive for most teams. Redis Flex moves data that is not being actively queried to SSD, keeping the hot set in memory, which makes large-scale vector indexes economically viable without requiring you to size your Redis cluster for the entire dataset in RAM.
The sub-millisecond retrieval claims for billion-scale vector indexes that Redis has been making publicly depend on Flex making that volume affordable, not just technically possible. If your index fits comfortably in RAM, the economics were already fine and Flex is less relevant. It matters when your data volume has outgrown what in-memory storage can cost-effectively hold.
What This Architecture Actually Assumes#
I want to name the assumptions explicitly because the architecture looks more automatic than it is when you are reading about it rather than building with it.
Your source databases need to support CDC. Postgres via logical replication, Oracle via LogMiner, MongoDB via change streams, these work. Not every database does, and not every database administrator will approve enabling CDC on a production system without a conversation about performance impact, log retention, and operational monitoring of the pipeline itself.
Entity models need to be defined upfront. Context Retriever generates tools from those models, which means someone has to model the business objects the agent needs, their fields, and their relationships, before the agent can use any of it. When source schemas evolve, the entity models need to evolve with them. This is ongoing schema maintenance work. It accumulates quietly as systems change.
The denormalized Redis copy is shaped for the agent's current access patterns. If those access patterns change significantly, you may need to reshape the copy. This is not necessarily expensive, but it is a dependency you are creating between how you model data for Redis and how your agents are expected to query it.
The payoff is real. Always-fresh data, sub-millisecond retrieval, a unified tool surface, agent memory that compounds across sessions. The operational commitment is also real. This architecture shifts where complexity lives. It does not eliminate complexity.
That is what eval-driven development for agents is actually about: measuring whether the architecture is delivering what it promises in production, not just in controlled tests. With a system this multi-layered, evals need to cover the CDC freshness guarantees, the memory retrieval accuracy, and the LangCache staleness policies, not just end-to-end answer quality.
FAQ#
What databases does Redis Data Integration support?#
RDI currently supports Postgres via logical replication, Oracle via LogMiner, MongoDB via change streams, and Snowflake, among others. The CDC support maturity varies by source database. Postgres with Debezium is the most production-proven path. For less common sources, check the current RDI documentation and confirm CDC support before committing to the architecture.
How does Context Retriever handle schema changes in source databases?#
Schema changes in source databases require corresponding updates to entity models. If a field is added to the Orders table in your source database, you need to add it to the Order entity model in Context Retriever before the generated tools will expose it. If a field is renamed or removed, the entity model can drift from the source schema without immediate loud failures. A CDC pipeline may continue syncing while the generated tools quietly reflect the old model. Building explicit schema change processes into your workflow is necessary, not optional.
What is the difference between short-term and long-term memory in Redis Iris?#
Short-term memory is TTL-based session state. It stores the current conversation history and context, and it expires when the configured TTL is reached. Long-term memory stores patterns, preferences, and insights promoted from short-term sessions. It persists across conversations until explicitly evicted. The mechanism for moving information from short-term to long-term is promotion, which you configure based on what your application decides is worth keeping. An insight about a user's preference might be promoted automatically when it appears consistently. A one-off context note from a session might be allowed to expire.
Can I use LangCache safely for rapidly changing data?#
You can, but only with explicit staleness policies. LangCache does not automatically tie cache expiration to your data's change rate. You need to configure cache TTLs and similarity thresholds by query type and entity type. A query about return policy can be cached for a day. A query about current stock levels should have an aggressive TTL, possibly minutes. The configuration is your responsibility. The default is not a production-grade policy for high-velocity data.
How does Redis Iris handle role-level access control?#
Access control is defined at the entity model level in Context Retriever. You specify which roles can read which entities, and the tool layer enforces those permissions before a query reaches the underlying data. This means the agent never has to implement access control in its reasoning logic. The tool surface itself is scoped to what the calling role is allowed to see.
What is the relationship between Redis Iris and MCP?#
Context Retriever generates MCP-compatible tools from entity models. The agent calls these tools through the MCP protocol, which means Redis Iris is compatible with any agent framework or orchestration system that supports MCP tool calling. The agent does not need to know the underlying Redis data structures or query interfaces. It calls named, typed tools through MCP, and Context Retriever handles the translation to Redis queries.
Is Redis Iris the same as using Redis as a cache in front of Postgres?#
The caching pattern is similar in principle but different in scope. Using Redis as a cache in front of Postgres is a well-established pattern for improving read performance on specific queries. Redis Iris takes that pattern and builds a full agent context layer on top of it: CDC-based sync to keep the cache current, entity modeling to shape the data for agent reads, generated MCP tools to expose it as a typed tool surface, and agent memory to persist cross-session context. The data movement mechanism is recognizable. The layer of tooling built on top of it is what is being productized specifically for agent workloads.