How Claude Code Compresses Your Conversation
March 2026
Claude Code runs in your terminal, reads your files, writes code, runs tests, all inside a single continuous conversation. But that conversation has a hard limit: a context window of around 200k tokens. When it fills up, something has to give.
I wanted to know not only what that "something" is, but how it works. Turns out you can read the answer straight out of the binary.
Inside the binary
Claude Code ships as a single executable file, not a folder full of .js files. Under the hood it's still JavaScript, but a bundler packs the source, the Node.js runtime, and all dependencies into one ~200MB file. Most of that is compiled machine code. But the JavaScript source is embedded in plain text:
If you know what to search for, you can find it. grep -boa finds the byte offset of a string match, even in binaries:
115,503,119 is a byte offset. We can use dd to extract the raw bytes around it:
What comes out is readable JavaScript: the compaction prompt, token budget constants, threshold logic, file restoration rules. I spent a while pulling on this thread, searching for related strings and cross-referencing offsets. Everything in this post comes from that process.
What's inside the context window?
Every API call to Claude is the same shape: you send an array of message objects, the model sends back a response. The "context window" is this array, and it has a size limit of around 200k tokens.
Each message in the array has a role and some content. Here's what the array looks like when you ask Claude Code to edit a file:
The system prompt is invisible and always present. Claude Code injects it at the start of every request: about 800 lines of instructions, tool definitions, and your CLAUDE.md config. You never see it in the terminal, but it costs ~16k tokens, and it's there on every single API call.
Why tool results are "user" messages Notice that tool results haverole: "user". That's because tools execute on your machine, outside the model. Claude asks to read a file, your computer does it, and the contents come back as if you sent them.
Tool calls and tool results are just messages. When Claude decides to read a file, it outputs a tool_use block, basically "call Read with this path." That's tiny, maybe 85 tokens. But the result comes back as a tool_result message containing the entire file, verbatim. A 142-line CSS file is 4,800 tokens. A large source file can be 10k+.
The entire array gets sent on every turn. This is the crucial part. Claude doesn't have memory between turns. It re-reads the whole conversation from scratch on every API call. The array never shrinks. Every file ever read, every command output, every response is still in there. It only grows.
Below is the conversation as a tape. Each segment is a message, sized by its token cost:
Here's what that growth looks like over eight calls:
Here's the breakdown at that point:
Nearly half the window is tool results. The system prompt that dominated early on is now just 8%.
The auto-compact trigger
Claude Code doesn't wait until the context window is completely full. The threshold is a budget, not a percentage. The code reserves room for two things:
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE Override the threshold with an environment variable. Set it to a number between 0–100 and the threshold becomes that percentage of the effective window. Lower values = more frequent, smaller compactions.First, it carves out space for the model's response (up to 20k tokens). Then it keeps a 13k-token buffer on top of that. For a 200k context window, auto-compact fires at roughly 167k tokens, about 83% full. The question the code asks is: "do I have less than 33k tokens of room left to work with?"
DISABLE_AUTO_COMPACT Set totrue to turn off auto-compact entirely while still letting you /compact manually. DISABLE_COMPACT=true kills compaction completely, even manual.
Here's what that budget looks like as actual space inside the 200k window:
How compaction works
When compaction fires, the model doesn't just chop off old messages. It does something more interesting. First, an analysis scratchpad: which files are still relevant, which errors are resolved, what the user actually wants. Then a structured 9-section summary that replaces the entire conversation.
Watch the left side. "Files" pulls from three different messages. "Pending" is inferred: the model connects the assistant's note about hex literals to the user's unresolved complaint and recognizes unfinished work. Everything else fades. File contents, intermediate reasoning, tool call details, all compressible.
Sections 1–6 capture what happened: the goal, the tech, the files, what broke, what was tried, and every user message verbatim. Sections 7–9 capture what's next: unfinished work, current state, next action. Two-thirds backward, one-third forward. Less a transcript, more a handoff note.
The compaction instructions demand concrete detail: "Include specific code snippets, file paths with line numbers, exact function signatures, and error messages rather than general descriptions." Section 03 doesn't say "edited some CSS." It says shared.css:1-8 with the exact change.
The compact API call
Compaction is itself an API call, the model summarizing its own conversation. A deliberately constrained one:
Everything gets stripped away except the ability to read and write text. The model can't use tools, can't think step-by-step, can't see images or documents. It gets the full conversation, a tight output budget, and one job: compress.
There are also two versions of the analysis phase instructions. The full version demands a chronological walkthrough with code snippets and function signatures. The lean version (behind a feature flag) treats analysis as a "planning scratchpad":
After the model writes both tags, the <analysis> block is stripped entirely. Regex-replaced with an empty string. Only the <summary> survives:
If the summary fails to stream (network issue, model hiccup) it retries once. And if the post-compact token count still exceeds the threshold, compaction triggers again on the very next turn, chaining until it fits.
The full compression pipeline:
The continuation prompt
After compaction, the old conversation is gone. In its place, the model receives a single user message that starts with:
"Pick up the last task as if the break never happened." If compaction works well, you shouldn't notice it at all.
What gets restored
This is the part that surprised me. After compaction, the model doesn't just get a summary. The system re-attaches several things automatically:
File restoration is the big one. It re-reads up to 5 recently accessed files (capped at 5k tokens each, 50k total) and attaches them to the context. The files you were actively editing are present as actual content, not just path names in a summary.
Background task statuses, invoked skills, and plan mode state are also re-attached.
What gets lost
Even with file restoration, compression is lossy. The summary preserves state over story, and some things don't fit neatly into either.
Pre-compaction safety net There's even a feature flag (tengu_summarize_tool_results) that, when enabled, instructs the model: "write down any important information you might need later in your response, as the original tool result may be cleared later." A mitigation against its own compression system. The model is told to save its notes before the conversation gets compacted.
The template keeps state (what files exist, what's broken, what's next) and drops story (how you got there, what you tried first, what you talked about along the way).
Why it matters
Understanding this system changed how I work with Claude Code. A few things I do differently now.
Section 1 captures "primary request and intent," so stating your goal clearly at the start means it survives compaction. Vague requests get vaguely summarized.
Explicit preferences like "always use single quotes" or "never auto-commit" get captured in section 6, "All User Messages," and persist across compactions. Implicit preferences shown through example are more likely to get lost.
You can also guide what the summary focuses on with /compact:
These get appended to the summary prompt. Useful when switching from debugging to feature work.
The threshold is tunable too. If you want compaction to kick in earlier or later:
The context window is a conveyor belt, not a wall. Understanding the machinery underneath helps you work with it instead of against it.