> For the complete documentation index, see [llms.txt](https://www.opencodebook.xyz/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://www.opencodebook.xyz/en/chapter_04_session_system/4.5_compaction_context_window_management.md).

# 4.5 Compaction: Context Window Management

> **Model**: claude-opus-4-6 (anthropic/claude-opus-4-6) **Generation Date**: 2025-02-17

***

Compaction (context compression) is the key mechanism in OpenCode for dealing with the LLM context window limit. When the conversation history grows too long, Compaction automatically clears old content to ensure the conversation can continue.

## 4.5.1 Why Is Compaction Needed?

> **Extended Explanation: The LLM Context Window**
>
> Every LLM has a **context window** limit -- the maximum number of tokens it can process in a single request. For example:

| Model         | Context Window   |
| ------------- | ---------------- |
| Claude Sonnet | 200,000 tokens   |
| GPT-4o        | 128,000 tokens   |
| Gemini Pro    | 2,000,000 tokens |

> In Agentic Coding scenarios, the context can fill up quickly:
>
> * System Prompt: \~2,000 tokens
> * AGENTS.md instructions: \~1,000-5,000 tokens
> * Messages per conversation turn: \~500-2,000 tokens
> * Output from each tool call: \~500-50,000 tokens (can be very large when reading large files)
>
> A 10-turn Agent conversation can easily consume 50,000+ tokens. When the cumulative token count exceeds the context window, the LLM will reject the request or produce an error.

The goal of Compaction is: **compress the conversation history to free up context space, without losing critical information**.

## 4.5.2 The Overflow Detection Algorithm: `isOverflow()`

`SessionCompaction.isOverflow()` determines whether the current conversation is about to exceed the context window:

```typescript
const COMPACTION_BUFFER = 20_000  // Reserve a 20K token buffer

export async function isOverflow(input: {
  tokens: MessageV2.Assistant["tokens"]
  model: Provider.Model
}) {
  const config = await Config.get()
  if (config.compaction?.auto === false) return false  // User disabled auto-compaction

  const context = input.model.limit.context
  if (context === 0) return false  // Model has no context limit

  // Calculate current token usage
  const count = input.tokens.total ||
    input.tokens.input + input.tokens.output +
    input.tokens.cache.read + input.tokens.cache.write

  // Calculate usable space = input limit - reserved buffer
  const reserved = config.compaction?.reserved ??
    Math.min(COMPACTION_BUFFER, ProviderTransform.maxOutputTokens(input.model))
  const usable = input.model.limit.input
    ? input.model.limit.input - reserved
    : context - reserved

  return count >= usable
}
```

**Key Parameters**:

* `COMPACTION_BUFFER = 20,000`: Reserves a 20K token buffer at the end of the context window. This ensures that even when overflow is detected, there is still enough space for the LLM to generate a final response and a compaction summary.
* `reserved`: The actual reserved space. Takes the smaller of `COMPACTION_BUFFER` and the model's maximum output tokens.

## 4.5.3 The Prune Strategy: Incremental Cleanup of Tool Outputs

Before triggering a full Compaction, OpenCode first attempts **Prune** -- a lighter-weight cleanup approach:

```typescript
export const PRUNE_MINIMUM = 20_000   // Clean at least 20K tokens
export const PRUNE_PROTECT = 40_000   // Protect tool outputs within the last 40K tokens
const PRUNE_PROTECTED_TOOLS = ["skill"] // Skill tool outputs are never cleaned

export async function prune(input: { sessionID: string }) {
  const config = await Config.get()
  if (config.compaction?.prune === false) return

  const msgs = await Session.messages({ sessionID: input.sessionID })
  let total = 0
  let turns = 0

  // Traverse from the most recent message backward
  loop: for (let msgIndex = msgs.length - 1; msgIndex >= 0; msgIndex--) {
    const msg = msgs[msgIndex]
    if (msg.info.role === "user") turns++
    if (turns < 2) continue          // Protect the last 2 conversation turns
    if (msg.info.role === "assistant" && msg.info.summary) break  // Stop at a summary

    for (let partIndex = msg.parts.length - 1; partIndex >= 0; partIndex--) {
      const part = msg.parts[partIndex]
      if (part.type === "tool" && part.state.status === "completed") {
        if (PRUNE_PROTECTED_TOOLS.includes(part.tool)) continue

        const estimate = Token.estimate(part.state.output)
        total += estimate

        if (total > PRUNE_PROTECT) {
          // Tool outputs beyond the protection zone -> mark for cleanup
          toPrune.push(part)
        }
      }
    }
  }

  // Execute cleanup: replace old tool outputs with "[truncated]"
  for (const part of toPrune) {
    await Session.updatePart({
      ...part,
      state: {
        ...part.state,
        output: "[output truncated by compaction]",
        time: { ...part.state.time, compacted: Date.now() },
      },
    })
  }
}
```

**The Core Idea Behind the Prune Algorithm**:

1. **Traverse message history from back to front**
2. **Protect all content from the last 2 conversation turns**
3. **Protect tool outputs within the last 40K tokens**
4. Old tool outputs beyond the protection zone -> replaced with `"[output truncated by compaction]"`
5. `skill` tool outputs are never cleaned (because Skill content is critical for subsequent behavior)

The intuition behind this strategy is: **old tool outputs (such as file contents read 10 turns ago) are likely no longer relevant and can be safely discarded; recent tool outputs are still the basis for current reasoning and must be preserved**.

## 4.5.4 The Compaction Execution Flow

If space is still insufficient after Prune, a full Compaction is triggered -- using the LLM to generate a **summary** that replaces the old message history:

```
Before Compaction:
+------------------------------------------+
| System Prompt                            |
| User Msg 1                               |
| Assistant Msg 1 (with tool calls)        |  <-- These will be replaced by a summary
| User Msg 2                               |
| Assistant Msg 2 (with tool calls)        |
| User Msg 3                               |
| Assistant Msg 3 (current)                |  <-- Preserved
+------------------------------------------+

After Compaction:
+------------------------------------------+
| System Prompt                            |
| [Compaction Summary]                     |  <-- Summary generated by the LLM
| "Summary of previous conversation:       |
|  The user requested a sorting function,  |
|  I completed the quicksort               |
|  implementation and passed tests..."     |
| User Msg 3                               |  <-- Recent conversation preserved
| Assistant Msg 3 (current)                |
+------------------------------------------+
```

The Prompt template used for Compaction is defined in `agent/prompt/compaction.txt`.

## 4.5.5 The Token Estimation Utility

Since precise token counting requires the model's tokenizer, OpenCode uses a fast estimation in certain scenarios:

```typescript
// util/token.ts
export function estimate(text: string): number {
  // Rough estimate: approximately 1 token per 4 characters
  return Math.ceil(text.length / 4)
}
```

This estimate does not need to be precise -- it is only used in the Prune strategy to judge "approximately" how many tokens have been cleaned. The precise token counts come from the LLM Provider's response metadata (the `usage` field).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.opencodebook.xyz/en/chapter_04_session_system/4.5_compaction_context_window_management.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
