Crux
GuidesCompaction

Sliding window

Stateful auto-rolling compression. Keep recent messages verbatim, summarize older ones.

A sliding window keeps the most recent N messages verbatim and automatically summarizes older messages when the window overflows. The summary lives in a CruxStore keyed by the window's id, so the running summary survives across requests.

What problem does this solve?

Long-running chat loops eventually exceed the context window. Dropping oldest messages loses information. Summarizing the entire history on every turn is expensive and noisy. The sliding window is the middle ground: recent messages stay full-fidelity (the model sees the verbatim text), older messages get folded into a compact running summary that stays cheap to include.

When should I use it?

  • You're building a chat or agent loop with rolling history
  • You want compaction to happen automatically when the window overflows
  • A running summary of older turns is acceptable as the representation

When should I NOT use it?

  • You only need the last N messages and don't care about older ones — keep it simple, no summarizer needed
  • You need structured data out of older messages — use extract key facts instead, possibly before sliding-window evicts them
  • You're processing a bounded batch of messages — use summarize messages one-shot

Quick start

sliding-window.ts
import { generateTextFn } from '@crux/ai'
import { createSlidingWindow } from '@crux/core/compaction'

const window = createSlidingWindow({
  id: 'chat-1',                 // store key — same id reuses the running summary
  windowSize: 20,               // keep last 20 messages verbatim
  generate: generateTextFn,
  model: cheapModel,            // a fast/cheap model is fine for summarization
  summaryBudget: 1000,          // max tokens for the running summary
  store: myStore,               // optional persistent store
})

await window.push({ role: 'user', content: 'Hello!' })
await window.push({ role: 'assistant', content: 'Hi there!' })

// On every turn, get the compacted message array (summary + recent messages)
const messages = await window.getMessages()

window.getStats()
// { totalMessages: 25, windowedMessages: 20, summaryTokens: 450, evictions: 1 }

Use a cheap, fast model for generate here — summaries don't need your best model. gpt-4o-mini, claude-haiku, or gemini-flash keeps compaction fast and inexpensive.

How eviction works

On every push():

  1. The new message gets appended to the in-memory window
  2. If window.length > windowSize, the oldest messages above the threshold are evicted
  3. Evicted messages get passed to summarizeMessages() along with the existing running summary
  4. The merged summary is saved back to the store under compact:{id}:summary

The next getMessages() call returns:

[
  { role: 'system', content: '<running summary>' },
  ...recent messages within the window
]

The summary message uses role system by default. Configure with the summaryRole option if your provider needs a different placement.

Persistence

Without a store, the running summary is lost on process restart. For a chat app, pass a CruxStore:

import { cruxConvexStore } from '@crux/convex'

const window = createSlidingWindow({
  id: `chat-${threadId}`,    // per-thread isolation
  store: cruxConvexStore({ component: components.crux, ctx }),
  windowSize: 20,
  generate: generateTextFn,
  model: cheapModel,
})

The id becomes the store key prefix — different threads must use different ids.

Hooks

const window = createSlidingWindow({
  // ...
  onCompactStart: ({ evictedCount, totalTokens }) => log.info('compacting'),
  onCompactEnd: ({ summaryTokens, ratio }) => metrics.gauge('summary.ratio', ratio),
})

onCompactEnd fires whenever a summarization round completes. It carries tokensBefore, tokensAfter, and ratio — useful for telemetry on how aggressive compaction is being.

Cost considerations

Sliding window calls the LLM only when the window overflows. With windowSize: 20, summarization runs roughly every Nth turn (depending on message length). Each summarization call replaces the previous summary — there's no log-O(N) overhead.

For a typical chat with 5–10 sentence turns, expect summarization to fire every 15–25 turns. The cost stays bounded by summaryBudget.

Where to next

On this page