Crux
GuidesCompaction

Compaction

Four primitives for keeping long conversations within token limits. Pick the one that matches your access pattern.

LLM context windows are finite, and conversations grow. Compaction is how you keep the model's view of history small without losing the parts that matter. Crux gives you four primitives — pick the one that matches how you need to compact, not what.

What problem does this solve?

Without compaction, a long-running chat or agent loop eventually overflows the context window. Naive solutions — drop the oldest messages, or summarize everything every N turns — either lose information or burn tokens summarizing things you didn't need to summarize.

Crux's four primitives target four different access patterns: stateful rolling compression, advisory pressure tracking, one-shot stateless summarization, and structured fact extraction. They compose: budget tracking decides when to call sliding window or extraction.

The four primitives, side by side

PrimitiveStateful?Calls LLM?Use when
createSlidingWindow()Yes — keeps a running summary in storeYes — summarizes evicted messagesBuilding a chat that needs auto-rolling history
createBudgetManager()No — pure trackerNoYou want to decide when to compact based on token pressure
summarizeMessages()No — pure functionYes — one summarization callAd-hoc summarization of a batch of messages
extractKeyFacts()No — pure functionYes — one structured-generation callPull typed objects (decisions, action items) out of a conversation
import {
  createSlidingWindow,
  createBudgetManager,
  summarizeMessages,
  extractKeyFacts,
} from '@crux/core/compaction'

When should I use which?

Sliding window is right when:

  • You're building a long-running chat or agent loop
  • You want compaction to happen automatically when the message count crosses a threshold
  • A running summary is acceptable as the representation of older history

Budget manager is right when:

  • You want to measure pressure and decide when to compact, but apply your own strategy
  • You're combining multiple compaction approaches (e.g. extract first, then summarize the rest)
  • You want pressure-level callbacks for telemetry

Summarize messages is right when:

  • You have a bounded batch of messages and want one summary right now
  • You don't need a stateful pipeline — fire and forget

Extract key facts is right when:

  • You want typed structured output out of a conversation, not prose
  • You're capturing decisions, action items, agreements — discrete facts, not narrative
  • The summary alone would lose structure you need to act on

When should I NOT use compaction?

  • You're operating on an already-compacted message history — don't double-compact
  • You only ever have a handful of messages — token cost of compaction exceeds the savings
  • You need an audit log of every message — compaction by design loses detail; persist the original elsewhere
  • The dropped detail is not recoverable from a summary — consider extracting key facts instead so structured data survives

How to combine them

In production chat apps you usually layer:

  1. Budget manager tracks pressure across system + history + tools
  2. When it crosses warning, extract key facts from older messages so structured info survives
  3. When it crosses critical, sliding window evicts and summarizes oldest messages

This is more work than a single primitive, but the result is targeted compaction — extraction preserves the structure you'll need next turn; summarization compresses what's safe to lose.

How this fits with the rest of Crux

  • Token budget drops contexts under pressure. Compaction shortens messages. They compose — budget tracking can include both.
  • Memory stores entries permanently. Compaction operates on the message history fed to a single LLM call.
  • Eval can score compaction quality with evaluateCompaction() and pre-built judges.

Pick a topic

On this page