Infrastructure
Pattern 13 of 26
Context Management
Fitting the world into a window
Context windows are finite and agentic tasks burn through them fast. Every tool result, every retrieved document, every piece of intermediate reasoning takes up space. Context management is the practice of deciding what the agent actually needs to see right now, and getting rid of everything else. It sounds simple. In a multi-step agent running dozens of tool calls, it is one of the most consequential decisions you will make.
Why it matters
Context management is not a performance optimization. It is a correctness problem. A context full of stale or irrelevant information changes what the model says. I have watched agents give completely different answers to the same question based purely on what else happened to be in the context at that moment.
Deep Dive
Context windows are finite, and agentic tasks burn through them in ways that are easy to underestimate. A multi-step research task accumulates tool results, conversation history, intermediate reasoning steps, and retrieved documents. Each piece of new information displaces something else, or forces the model to reason over an increasingly crowded prompt. Context management is the practice of deciding what the agent actually needs to see right now, discarding what it does not, and pulling in the rest only when needed. This is not just a cost optimization. It directly affects output quality.
The most important finding on this topic came from the Lost in the Middle paper (2023), which showed that language models systematically under-use information positioned in the middle of long contexts. Performance degrades when the relevant information is not near the beginning or end of the prompt. The implication for how you structure context is direct: the most important things go first, not buried in the middle of a long tool result dump. This finding has held up across multiple follow-up studies and still applies to most current models.
Anthropic's contextual retrieval work (September 2024) tackled a specific failure mode in RAG pipelines. Chunks pulled from a vector database often lack the surrounding context needed to interpret them correctly. A chunk that says the revenue declined 12% means nothing without knowing what period or company it refers to. By prepending a generated context sentence to each chunk before embedding, retrieval recall improved by 49% in their tests. The field now treats context engineering as its own sub-discipline, covering compaction strategies, sliding window approaches, and retrieval patterns that are genuinely distinct from the prompt engineering ideas of earlier years.