Context Management

Fitting the world into a window

Context windows are finite and agentic tasks burn through them fast. Every tool result, every retrieved document, every piece of intermediate reasoning takes up space. Context management is the practice of deciding what the agent actually needs to see right now, and getting rid of everything else. It sounds simple. In a multi-step agent running dozens of tool calls, it is one of the most consequential decisions you will make.

Why it matters

Context management is not a performance optimization. It is a correctness problem. A context full of stale or irrelevant information changes what the model says. I have watched agents give completely different answers to the same question based purely on what else happened to be in the context at that moment.

Deep Dive

Context windows are finite, and agentic tasks burn through them in ways that are easy to underestimate. A multi-step research task accumulates tool results, conversation history, intermediate reasoning steps, and retrieved documents. Each piece of new information displaces something else, or forces the model to reason over an increasingly crowded prompt. Context management is the practice of deciding what the agent actually needs to see right now, discarding what it does not, and pulling in the rest only when needed. This is not just a cost optimization. It directly affects output quality.

The most important finding on this topic came from the Lost in the Middle paper (2023), which showed that language models systematically under-use information positioned in the middle of long contexts. Performance degrades when the relevant information is not near the beginning or end of the prompt. The implication for how you structure context is direct: the most important things go first, not buried in the middle of a long tool result dump. This finding has held up across multiple follow-up studies and still applies to most current models.

Anthropic's contextual retrieval work (September 2024) tackled a specific failure mode in RAG pipelines. Chunks pulled from a vector database often lack the surrounding context needed to interpret them correctly. A chunk that says the revenue declined 12% means nothing without knowing what period or company it refers to. By prepending a generated context sentence to each chunk before embedding, retrieval recall improved by 49% in their tests. The field now treats context engineering as its own sub-discipline, covering compaction strategies, sliding window approaches, and retrieval patterns that are genuinely distinct from the prompt engineering ideas of earlier years.

In the Wild

Context Compaction

Prompt Caching

RAG pipelines

Hierarchical Memory Systems

Go Deeper

PAPERLost in the Middle: How Language Models Use Long ContextsStanford / arXiv · 2023 ARTICLEContextual RetrievalAnthropic · 2024 ARTICLEEffective Context Engineering for AI AgentsAnthropic Engineering GUIDERetrieval-Augmented Generation (RAG) GuidePinecone ARTICLEContext Engineering: LLM Memory and RetrievalWeaviate

Related Patterns

Memory Patterns Structured Output Agent Harnesses

All 26 patternsRead the blogHome