Short answer: Context engineering for AI agents is the discipline of deciding what an agent sees at every step — system instructions, tools, retrieved data, memory, and prior conversation — so the model gets the smallest set of high-signal tokens it needs to act correctly. It is the successor to prompt engineering. Prompt engineering optimized a single message; context engineering optimizes the entire information environment an agent operates inside, across many steps. The reason it matters is counterintuitive: adding more context usually makes agents worse, not better. Models have a finite attention budget, and quality degrades as the window fills — a measured effect researchers now call "context rot." Getting an agent to production is mostly the work of controlling that window.
The gap between an agent that demos well and an agent you can run unattended is, more than anything, a context problem. Below is what context engineering actually involves, why it beats prompt engineering, the failure mode it exists to prevent, the strategies that work, and how we approach it.
From prompt engineering to context engineering
Prompt engineering asked one question: how do I phrase this request so the model gives me what I want? That question made sense when the interaction was a single turn — one carefully worded message, one response. It is still useful. But an agent is not a single turn. An agent loops: it reads a goal, calls a tool, reads the result, reasons, calls another tool, and repeats, sometimes for dozens of steps. By step ten, the original prompt is a small fraction of what the model is actually reading.
Context engineering asks a different question: of everything the agent could see right now, what does it actually need — and what should we strip out? Anthropic's Applied AI team, which formalized the term in September 2025, framed the goal precisely: find "the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." The emphasis on smallest is the whole shift. Prompt engineering was about adding the right words. Context engineering is just as often about removing the wrong ones.
This is not a rebrand. The two operate at different scopes. A prompt is one component of context. Context is the prompt plus the system instructions, the tool definitions, the retrieved documents, the conversation history, the agent's memory, and the output of every tool call so far. You can write a flawless prompt and still get a broken agent if the other components are bloated, stale, or contradictory. In production, they usually are.
Context rot: why more context makes agents worse
The instinct when an agent fails is to give it more — more examples, more documents, more history, more tool options. This instinct is usually wrong, and there is now measurement to prove it.
Chroma's 2025 research formalized a phenomenon it called context rot: as you add tokens to an LLM's input, the quality of its output decreases. The study tested 18 frontier models against increasing context lengths, and every one degraded. Crucially, the degradation was not just about hitting a length limit. The study found that semantically similar but irrelevant content actively misled the models — distractor text that looked relevant pulled answers off course beyond what raw length alone explained.
The numbers are worse than most teams assume. Reported research found models showing clear accuracy degradation well before their advertised limits — some by around 1,000 tokens, and a model rated for 200K tokens of context can begin degrading near 50K. Agentic performance on models advertising 1M-plus token windows has been reported to fall off severely already around 100K tokens. The advertised context window is a capacity, not a promise of quality across that capacity.
And this is not a niche engineering concern — it is a leading cause of failure. Reported analyses attribute a large share of enterprise AI failures in 2025 to context drift or memory loss during multi-step reasoning. That tracks with the broader production gap: Gartner predicts more than 40% of agentic AI projects will be scrapped by 2027, and most analyses put the share of agent initiatives that never reach production above 80%. The model is rarely the problem. The context around it usually is.
The attention budget: why "less is more" is literal
The reason context rot happens is mechanical, not mysterious. A transformer model attends across every token in its window. As the window grows, attention spreads thinner — each token gets a smaller share of a fixed budget. A single relevant sentence becomes statistically harder to find when it is surrounded by tens of thousands of low-signal tokens. The model has not "forgotten" the important detail; it has diluted it.
Treating attention as a budget reframes every decision about what goes into the window. Each token you add has a cost: it competes with every other token for the model's focus. So the question for every piece of context becomes economic — does this earn its place? A redundant tool description, a stale document, three paragraphs of conversation history that no longer matter — each one is not neutral. It actively degrades the signal the model is trying to act on.
This is why a mediocre prompt inside a surgically curated context often beats a brilliant prompt buried in noise. The skill that separates teams getting strong results from teams getting frustrating ones is rarely prompt wording. It is discipline about what enters the window in the first place.
The four strategies: write, select, compress, isolate
The practitioner community — anchored by LangChain's widely cited framing — has converged on four strategies for managing an agent's context. They are not competing approaches; a production agent usually uses all four.
- Write — save context outside the window. Instead of carrying everything in the prompt, the agent writes state, notes, or intermediate results to external storage (a scratchpad, a state object, a persistent store) and pulls them back only when needed. The window stays lean while nothing is lost.
- Select — pull in only what the current step needs. This is retrieval done deliberately: fetch the relevant documents, the relevant memories, and even the relevant tools for this step, rather than loading everything by default. Selecting the right three tools out of forty keeps tool definitions from crowding out the actual task.
- Compress — keep only the tokens required to act. Long conversations get summarized; verbose tool outputs get distilled to their essentials before they re-enter the window. Compression trades a small, controlled loss of detail for a large gain in signal density.
- Isolate — split context across boundaries. When a task involves several distinct jobs — explore a large codebase, parse a long document, write a summary — handing all of them to one agent pollutes its window. Isolation gives each subtask its own agent with a clean context, then combines the results. Each sub-agent works in clear air.
The judgment is in the mix. Write and select keep the window relevant; compress and isolate keep it small. A durable agent applies them continuously, not once at design time.
What actually lives in an agent's context window
To engineer context, you have to see it as the layered thing it is. At any given step, an agent's window typically holds:
Most context problems are a specific layer misbehaving: tool definitions that are too many, retrieved data that is too broad, history that is never compressed, memory that is never selected. Naming the layers is how you find the leak.
Context engineering over a long-running agent
The hardest version of the problem is not a single request — it is an agent that runs for hours or across many sessions. Here the window cannot simply accumulate; it would rot long before the task finished. Long-running agents need an explicit lifecycle for context.
In practice that means three recurring moves. The agent offloads state to external memory as it works, so progress survives outside the window (the write strategy). It compacts the running history at intervals — summarizing what happened and dropping the raw transcript — so the window reflects the situation without carrying every token that produced it. And it clears spent context: once a tool result has been used and its conclusion recorded, the verbose original can leave the window. The agent keeps the decision, not the debris.
This lifecycle is also where context engineering meets the rest of getting an agent to production. The same multi-step runs that rot are the ones you need to trace and observe to see where the window went wrong, and the same failures become cases in your eval harness so you can prove a context change actually helped. Context engineering is one face of the same discipline as observability and evals: making an agent's behavior legible, measurable, and improvable rather than hoping it holds.
Common mistakes we see
A handful of context mistakes show up again and again, and they share a root: treating context as something you add to rather than something you curate.
The first is dumping. Teams retrieve everything that might be relevant and let the model sort it out. The model can't — the irrelevant material doesn't sit quietly, it misleads. This is the "Dumb RAG" failure: retrieval that fetches plausibly-related but wrong context and degrades the answer.
The second is over-tooling. An agent given forty tools spends a large share of its window on tool descriptions and reasons worse about which to use. Selecting a small, relevant tool set per step is almost always better than exposing the full catalog.
The third is never compressing. Conversation and tool history are allowed to grow unbounded until the agent is reasoning over a transcript that is mostly stale. By the time anyone notices, the agent has been quietly degrading for many turns.
The fourth is confusing memory with history. Keeping the full transcript is not memory; it is hoarding. Memory is the deliberate selection of what should persist — a few durable facts — not the refusal to throw anything away.
How Moai Team approaches this
We treat the context window as the primary engineering surface of an agent, not an afterthought once the prompt is written. From the first prototype we are explicit about what each layer of context contains and why every token in the window earns its place. We design retrieval to select narrowly rather than dump broadly, expose the smallest useful tool set at each step, and build compaction and clearing into long-running agents so they don't rot over a session. We wire this to observability so we can see what was in the window on any bad run, and to evals so a context change is measured rather than assumed. The point is not a clever prompt that wins a demo. It is an agent whose context stays high-signal on the thousandth real request the way it did on the first — which is, in the end, what "production" means.
Frequently Asked Questions
What is context engineering for AI agents?
Context engineering for AI agents is the discipline of curating everything a model sees at each step of an agent's run — system instructions, tool definitions, retrieved data, memory, and conversation history — so it receives the smallest set of high-signal tokens needed to act correctly. It is broader than prompt engineering, which optimizes only a single message; context engineering manages the whole information environment across many steps.
How is context engineering different from prompt engineering?
Prompt engineering optimizes how you phrase one request. Context engineering optimizes everything an agent sees across an entire multi-step run — and often means removing tokens, not adding them. A prompt is just one component of context; the others (tools, retrieved data, history, memory) can break an agent even when the prompt is perfect. As agents loop over many steps, context engineering matters far more than prompt wording.
What is context rot?
Context rot is the measured phenomenon, formalized by Chroma's 2025 research, that LLM output quality decreases as you add tokens to the input. Testing 18 frontier models showed every one degraded as context grew, and that semantically similar but irrelevant content actively misled them. Many models degrade well before their advertised context limits, which is why simply giving an agent more context usually makes it worse.
What are the four context engineering strategies?
Write (save context to external storage so the window stays lean), select (pull in only the documents, memories, and tools the current step needs), compress (summarize history and tool output to keep only essential tokens), and isolate (split a task across separate agents so each works in a clean window). Production agents typically use all four together rather than choosing one.
Moai Team builds AI agents with context engineered from the first prototype — narrow retrieval, lean tool sets, and compaction wired in — so they stay reliable on the thousandth real request, not just the demo. Schedule a call.