How Claude Code Compresses Your Conversation

March 2026

Claude Code runs in your terminal, reads your files, writes code, runs tests, all inside a single continuous conversation. But that conversation has a hard limit: a context window of around 200k tokens. When it fills up, something has to give.

I wanted to know not only what that "something" is, but how it works. Turns out you can read the answer straight out of the binary.

Inside the binary

Claude Code ships as a single executable file, not a folder full of .js files. Under the hood it's still JavaScript, but a bundler packs the source, the Node.js runtime, and all dependencies into one ~200MB file. Most of that is compiled machine code. But the JavaScript source is embedded in plain text:

compiled machine code ↑ JavaScript source more compiled code

If you know what to search for, you can find it. grep -boa finds the byte offset of a string match, even in binaries:

$ grep -boa "Primary Request" claude 115503119:Primary Request

115,503,119 is a byte offset. We can use dd to extract the raw bytes around it:

$ dd if=claude bs=1 skip=115498000 count=15000 2>/dev/null \ | tr '\0' '\n' | grep -v '^$'

What comes out is readable JavaScript: the compaction prompt, token budget constants, threshold logic, file restoration rules. I spent a while pulling on this thread, searching for related strings and cross-referencing offsets. Everything in this post comes from that process.

What's inside the context window?

Every API call to Claude is the same shape: you send an array of message objects, the model sends back a response. The "context window" is this array, and it has a size limit of around 200k tokens.

Each message in the array has a role and some content. Here's what the array looks like when you ask Claude Code to edit a file:

messages = [ { role: "system", content: "You are Claude Code..." } { role: "user", content: "Add dark mode to the blog" } { role: "assistant", content: "I'll read your styles first." } { role: "assistant", content: { tool_use: Read("shared.css") } } { role: "user", content: { tool_result: "*, *::before, *::after..." } } { role: "assistant", content: "Found hardcoded colors, replacing..." } ...this array keeps growing with every turn ]

The system prompt is invisible and always present. Claude Code injects it at the start of every request: about 800 lines of instructions, tool definitions, and your CLAUDE.md config. You never see it in the terminal, but it costs ~16k tokens, and it's there on every single API call.

Why tool results are "user" messages Notice that tool results have role: "user". That's because tools execute on your machine, outside the model. Claude asks to read a file, your computer does it, and the contents come back as if you sent them.

Tool calls and tool results are just messages. When Claude decides to read a file, it outputs a tool_use block, basically "call Read with this path." That's tiny, maybe 85 tokens. But the result comes back as a tool_result message containing the entire file, verbatim. A 142-line CSS file is 4,800 tokens. A large source file can be 10k+.

The entire array gets sent on every turn. This is the crucial part. Claude doesn't have memory between turns. It re-reads the whole conversation from scratch on every API call. The array never shrinks. Every file ever read, every command output, every response is still in there. It only grows.

Below is the conversation as a tape. Each segment is a message, sized by its token cost:

The conversation tape
call 1
16k
call 2
21k
call 3
22k
call 4
36k
replay

Here's what that growth looks like over eight calls:

Context window filling up
calls over time
Context size: 0 tokens Window limit: 200k
replay

Here's the breakdown at that point:

Full session breakdown
 
System User Assistant Tool calls Tool results Free
replay

Nearly half the window is tool results. The system prompt that dominated early on is now just 8%.

The auto-compact trigger

Claude Code doesn't wait until the context window is completely full. The threshold is a budget, not a percentage. The code reserves room for two things:

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE Override the threshold with an environment variable. Set it to a number between 0–100 and the threshold becomes that percentage of the effective window. Lower values = more frequent, smaller compactions.

First, it carves out space for the model's response (up to 20k tokens). Then it keeps a 13k-token buffer on top of that. For a 200k context window, auto-compact fires at roughly 167k tokens, about 83% full. The question the code asks is: "do I have less than 33k tokens of room left to work with?"

DISABLE_AUTO_COMPACT Set to true to turn off auto-compact entirely while still letting you /compact manually. DISABLE_COMPACT=true kills compaction completely, even manual.

Here's what that budget looks like as actual space inside the 200k window:

Compact threshold
auto-compact triggers here
your conversation
buffer
13k
output
20k
0 200k tokens
replay
Context window
Tokens: 0 / 200k 0%
replay

How compaction works

When compaction fires, the model doesn't just chop off old messages. It does something more interesting. First, an analysis scratchpad: which files are still relevant, which errors are resolved, what the user actually wants. Then a structured 9-section summary that replaces the entire conversation.

Compaction
Conversation · 6 messages · ~22k tokens
user "Add dark mode to the blog, all pages"
assistant Reading current styles to understand the color system...
Read shared.css — 142 lines, hardcoded #ffffff backgrounds
assistant Replacing hardcoded colors with CSS vars. Also found post-arms.js uses hex literals — fixing those too.
Edit shared.css updated — added :root vars, swapped 14 values
user "The canvas colors are still light"
9-section summary · ~1,400 tokens
what happened
01 Intent Add dark mode to all pages of the blog
02 Tech CSS custom properties, JS canvas hex colors
03 Files shared.css:1-8 — CSS vars added. post-arms.js — hardcoded canvas colors
04 Errors Canvas still light — CSS vars don't reach canvas API
05 Solving CSS custom properties over class-toggle
06 User "Add dark mode, all pages" · "Canvas colors still light"
what's next
07 Pending Fix canvas hex literals in post-arms.js
08 Current Canvas still has light colors — replace #fff/#000 in JS
09 Next Read post-arms.js, replace hardcoded colors with getComputedStyle
~22,000 tokens → ~1,400 tokens (93% smaller)
replay

Watch the left side. "Files" pulls from three different messages. "Pending" is inferred: the model connects the assistant's note about hex literals to the user's unresolved complaint and recognizes unfinished work. Everything else fades. File contents, intermediate reasoning, tool call details, all compressible.

Sections 1–6 capture what happened: the goal, the tech, the files, what broke, what was tried, and every user message verbatim. Sections 7–9 capture what's next: unfinished work, current state, next action. Two-thirds backward, one-third forward. Less a transcript, more a handoff note.

The compaction instructions demand concrete detail: "Include specific code snippets, file paths with line numbers, exact function signatures, and error messages rather than general descriptions." Section 03 doesn't say "edited some CSS." It says shared.css:1-8 with the exact change.

The compact API call

Compaction is itself an API call, the model summarizing its own conversation. A deliberately constrained one:

Normal call vs. compact call
Normal API call
Extended thinking
Tool use Read, Edit, Bash, Grep...
Images & documents
Up to 64k output tokens
Your system prompt ~16k tokens
Compact call
Extended thinking
Tool use denied entirely
Images & documents → [image] / [document]
Max 20k output tokens
"Summarize this conversation." new system prompt

Everything gets stripped away except the ability to read and write text. The model can't use tools, can't think step-by-step, can't see images or documents. It gets the full conversation, a tight output budget, and one job: compress.

There are also two versions of the analysis phase instructions. The full version demands a chronological walkthrough with code snippets and function signatures. The lean version (behind a feature flag) treats analysis as a "planning scratchpad":

<analysis> Treat this as a private planning scratchpad — it is not the place for content meant to reach the user. Use it to plan, not to draft. - Walk through chronologically and note what belongs in each of the 9 sections below - Do NOT write code snippets here — save those for <summary> where they will actually be kept The goal of <analysis> is coverage, not detail. The detail goes in <summary>. </analysis>

After the model writes both tags, the <analysis> block is stripped entirely. Regex-replaced with an empty string. Only the <summary> survives:

<analysis> scratchpad notes... </analysis>
→ stripped
<summary> 9-section summary... </summary>
→ kept

If the summary fails to stream (network issue, model hiccup) it retries once. And if the post-compact token count still exceeds the threshold, compaction triggers again on the very next turn, chaining until it fits.

The full compression pipeline:

Compression in action
conversation
[
system "You are Claude Code, an AI assistant..." ~16k tokens
user "Add dark mode to the blog, all pages"
assistant Reading current styles to understand the color system...
tool_use Read("shared.css")
tool_result shared.css — 142 lines, *, *::before { box-sizing... :root { --bg: #ffffff...
assistant Replacing hardcoded colors. Found post-arms.js uses hex literals — fixing those too.
tool_use Edit("shared.css", { old: "#ffffff", new: "var(--bg)" ... })
tool_result shared.css updated — added :root color vars, swapped 14 values
user "The canvas colors are still light"
]
compressed
replay

The continuation prompt

After compaction, the old conversation is gone. In its place, the model receives a single user message that starts with:

This session is being continued from a previous conversation that ran out of context. The summary below covers the earlier portion of the conversation. [...structured summary...] Continue the conversation from where it left off without asking the user any further questions. Resume directly — do not acknowledge the summary, do not recap what was happening, do not preface with "I'll continue" or similar. Pick up the last task as if the break never happened.

"Pick up the last task as if the break never happened." If compaction works well, you shouldn't notice it at all.

What gets restored

This is the part that surprised me. After compaction, the model doesn't just get a summary. The system re-attaches several things automatically:

Post-compact context
What you'd expect
  • Summary
  • Free context
What actually happens
  • Summary
  • Files re-read
  • Hooks & skills
  • Tools
  • Free

File restoration is the big one. It re-reads up to 5 recently accessed files (capped at 5k tokens each, 50k total) and attaches them to the context. The files you were actively editing are present as actual content, not just path names in a summary.

Background task statuses, invoked skills, and plan mode state are also re-attached.

What gets lost

Even with file restoration, compression is lossy. The summary preserves state over story, and some things don't fit neatly into either.

What survives compaction
Survives
File paths + line numbers
Function signatures
Error messages
User preferences
Task lists
Architectural decisions
Recently read file contents (restored)
Gets fuzzy
Nuanced reasoning chains
Rejected alternatives
Why one approach over another
Multi-step debugging context
Dropped
Casual asides
Tangential discussions
"What if..." exploration
File contents not recently read
Images and documents (→ [image])
Pre-compaction safety net There's even a feature flag (tengu_summarize_tool_results) that, when enabled, instructs the model: "write down any important information you might need later in your response, as the original tool result may be cleared later." A mitigation against its own compression system. The model is told to save its notes before the conversation gets compacted.

The template keeps state (what files exist, what's broken, what's next) and drops story (how you got there, what you tried first, what you talked about along the way).


Why it matters

Understanding this system changed how I work with Claude Code. A few things I do differently now.

Section 1 captures "primary request and intent," so stating your goal clearly at the start means it survives compaction. Vague requests get vaguely summarized.

Explicit preferences like "always use single quotes" or "never auto-commit" get captured in section 6, "All User Messages," and persist across compactions. Implicit preferences shown through example are more likely to get lost.

You can also guide what the summary focuses on with /compact:

$ /compact focus on the test failures and the auth refactor # or $ /compact include file reads verbatim, remember the CSS bugs

These get appended to the summary prompt. Useful when switching from debugging to feature work.

The threshold is tunable too. If you want compaction to kick in earlier or later:

# Compact earlier (at 60% of effective window) $ CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60 claude # Never auto-compact, only manually $ DISABLE_AUTO_COMPACT=true claude

The context window is a conveyor belt, not a wall. Understanding the machinery underneath helps you work with it instead of against it.