On Compaction
Forgetting
Forty minutes into a complex task, an agent starts repeating itself,
re-derives old conclusions, proposes an approach it already tried and abandoned.
Old findings get compressed into a summary, and the summary lost the subtle reasoning. The sharp edges got sanded off. A limit got reached, and so an algorithm guessed at what was important to retain. The agent continues with a lossy copy of its own thinking, but the work continues.
As a user on the outside, sometimes we power through it, sometimes we throw our hands up and start a new session. Until that session degrades too.
The standard read on this: context windows will get bigger. Models will get better at long-range attention. Compaction algorithms will get smarter. The implicit assumption: the architecture is fine, the scarce resource is context. Make that resource less scarce and the problem goes away.
Maybe.
Context preservation techniques
The community continues to eveolve lots of clever techniques for managing context.
Sub-agents
Instead of burning the main agent’s context on every sub-task, you spin up a smaller agent with its own context, let it do the work, and return a conclusion. The main agent’s context burns slower because it only absorbs summaries, not the full reasoning process.
This helps. It’s a real improvement. But the main agent is still degrading. You slowed the burn rate, but you didn’t change what’s burning.
The Checklist
The second fix is the checklist loop. Write the plan to a file. Give the agent one task at a time. Reload context fresh for each task. Externalize everything to disk so there’s nothing in the context window that needs to survive.
This is genuinely good engineering. It treats the context window as volatile scratch space — you keep emptying it, so rot can’t accumulate. It’s thermostatic control: read the state, compare to the goal, take the next action, repeat. A thermostat doesn’t understand heat transfer. It reads a number and flips a switch. And for a surprising range of real work, that’s sufficient.
But someone has to write the checklist. Hard engineering problems don’t arrive pre-decomposed. When the work reveals that the plan was wrong — when you discover mid-task that your decomposition was about the wrong thing — the checklist can’t adapt. It can change what the agent does next. It can’t change what the agent understands.
Preemptive compaction
As the agent approaches its context limit, have it write up its current state and pick-up instructions. Reboots fresh and have this agent pick up where the old one left off. More thoughtful than algorithmic compaction. More adaptive than a fixed checklist.
The necessary coordinator
Each of these is a genuine improvement on the one before. Sub-agents burn the context slower. The checklist avoids burning it at all for simple tasks. Preemptive compaction manages the burn more gracefully.
Each still assume there is a prime agent, and that notion still holds that agent’s context in special vaunted status. The necessary coordinator. Without that coordination, the work cannot continue. And so everything else is a strategy for keeping that agent alive and functioning as long as possible.
The context window is finite, so you manage the finite resource.
The prime agent is the thread of reasoning, so you protect the thread.
Fire, uncontained and contained
Fire burning in the open is useful. You can warm yourself by it, cook over it, see by it. But it’s ambient. The heat goes everywhere. You manage it by tending it. More fuel, less fuel, a ring of stones, a cleared area so the sparks don’t catch. Every improvement is a better way of tending the same combustion.
But fire sculpted by a mechanism becomes something else. It is, in one of the oldest senses of the word, ‘an engine’.
The heat goes where the machine directs it. The mechanism doesn’t make the fire hotter or more efficient — it makes the fire’s output structural. The fire does the same thing it always did. The machine is what changed.
In the early 1700s, Thomas Newcomen built one of the first machines to do this with steam. His atmospheric engine pumped water out of coal mines by injecting steam into a cylinder, then injecting cold water to condense it — the condensation created a vacuum, the atmosphere pushed the piston down, water got pumped. It worked for sixty years. But it was roughly 1% thermally efficient, because the cold water cooled the cylinder on every stroke, and most of the fuel went to reheating what had just been cooled. The mechanism that did the work also destroyed the conditions for doing more work.
In the 1760s, James Watt was repairing a model Newcomen engine at the University of Glasgow. He wasn’t trying to build a new kind of engine. He was trying to understand why the model used so much steam. And he noticed where the waste was going: into reheating the cylinder.
His fix was not “make a better cylinder.” It was: stop condensing in the cylinder. Move the condensation to a separate vessel — a condenser — that stays cold while the cylinder stays hot. Each component does one job, in the conditions suited to that job. Efficiency roughly tripled. Not from a better version of Newcomen’s engine, but from a different machine that happened to use the same steam.
Heating and condensation were fighting over the same vessel.
Coordination
Inference and memory are fighting over the same vessel.
The context window is good at inference: hot, expensive, high-bandwidth, the place where reasoning actually happens. It is bad at memory: finite, degrading, lossy under compression. We keep trying to make it do both.
Compaction is the cold water injection. It preserves a lossy version of the agent’s state so inference can continue — but it degrades the context that makes inference valuable. The agent spends tokens re-deriving conclusions, re-orienting in territory it already mapped, reconstructing judgment from summaries of judgment. Every compaction cycle means reheating a cooled cylinder.
- Sub-agents coordinate multiple cylinders in parallel.
- Checklists coordinate multiple cylinders in series.
- Preemptive compaction more elegantly times the cooling.
What would this look like?
If you stop trying to preserve the cylinder — if you accept that the context window is scratch space, not storage — the architecture changes shape on its own.
There’s no prime agent. There’s no single thread of reasoning to protect.
Instead there’s a back-and-forth:
An assessment step reads the original prompt, reads the current state of the work, reads the log of what’s been done. Makes a judgment: what needs to happen next? Dispatches work.
A work step receives a task, does the work, writes its findings somewhere persistent, logs what it did, and yields its final notes to exit.
It doesn’t need to explain its entire reasoning back to the assessor. That’s in the report. The assessor can double check agent’s homework if it seems prudent, otherwise it can proceed with more of the high level task.
This next assessment could be a resumption of the previous assessor’s context. But it doesn’t have to be. It could be a fresh agent that reads the prompt, the current working state, and the log as needed. Fresh context, full fidelity, reading the current state of the world.
Like a shift change on a navy boat:
- You know the standing orders.
- You check the notes from the last shift.
- You check the state of the world.
- Then you get to planning what needs to be done next.
The “thread of reasoning” isn’t in any agent’s head. It’s in the files.
This is not recursive work burning down a single finite resource. It’s a trampoline — a back and forth where each participant starts fresh and reads the current state of the work. Context isn’t precious. It’s a fresh cylinder, heated and ready. What’s scarce is something else entirely: good judgment about what to do next. Discipline. And good judgment comes from clear state and full history, not from a degrading memory of having been there.
The agents are the steam
Watt didn’t just optimize the Newcomen design. But he also didn’t reinvent fire or steam. He changed which parts did what. Which parts stay hot, which parts stay cold, and where the work accumulates.
Agents are the steam. You don’t waste them, you choose the right kind for the job. But you don’t design the whole machine around keeping one batch of steam alive. You design the machine so the steam can do its work and be replaced by fresh steam, and the work persists in the parts that were built to hold it.
I don’t know that any of this is right. I’m messing around with a Claude Max subscription just like lots of other people. I am not an AI researcher and I don’t have benchmarks or even a working proof of concept yet.
It’s a premise, a feeling even. It’s a shape I keep noticing where the current approaches all seem to be optimizing within a design that might have the seam in the wrong place.
But maybe the move now is the same as it was in 1765.
Separate the condenser.
Appendix
A Condensor Shape
My hunch about what the condenser looks like:
[A] partitioned shared working state
- Partitioned meaning multiple agents can access and work on different parts without needing to read in the entire corpos
- Shared meaning any agent can write to any portion (efficiently shareing is policy, not mechanism)
[B] a queryable append-only log of what happened
- An immutable log lets the agents and the humans see if someone isn’t playing well with others
- Queryable means that if a new finding challanges an old assumption, old work can be reexamined at full fidelity (like a a detective reading cold case files)