Sliding window

Stateful auto-rolling compression. Keep recent messages verbatim, summarize older ones.

A sliding window keeps the most recent N messages verbatim and automatically summarizes older messages when the window overflows. The summary lives in a CruxStore keyed by the window's id, so the running summary survives across requests.

What problem does this solve?

Long-running chat loops eventually exceed the context window. Dropping oldest messages loses information. Summarizing the entire history on every turn is expensive and noisy. The sliding window is the middle ground: recent messages stay full-fidelity (the model sees the verbatim text), older messages get folded into a compact running summary that stays cheap to include.

When should I use it?

You're building a chat or agent loop with rolling history
You want compaction to happen automatically when the window overflows
A running summary of older turns is acceptable as the representation

When should I NOT use it?

You only need the last N messages and don't care about older ones — keep it simple, no summarizer needed
You need structured data out of older messages — use extract key facts instead, possibly before sliding-window evicts them
You're processing a bounded batch of messages — use summarize messages one-shot

Quick start

sliding-window.ts

import { generateTextFn } from '@crux/ai'
import { createSlidingWindow } from '@crux/core/compaction'

const window = createSlidingWindow({
  id: 'chat-1',                 // store key — same id reuses the running summary
  windowSize: 20,               // keep last 20 messages verbatim
  generate: generateTextFn,
  model: cheapModel,            // a fast/cheap model is fine for summarization
  summaryBudget: 1000,          // max tokens for the running summary
  store: myStore,               // optional persistent store
})

await window.push({ role: 'user', content: 'Hello!' })
await window.push({ role: 'assistant', content: 'Hi there!' })

// On every turn, get the compacted message array (summary + recent messages)
const messages = await window.getMessages()

window.getStats()
// { totalMessages: 25, windowedMessages: 20, summaryTokens: 450, evictions: 1 }

Use a cheap, fast model for generate here — summaries don't need your best model. gpt-4o-mini, claude-haiku, or gemini-flash keeps compaction fast and inexpensive.

How eviction works

On every push():

The new message gets appended to the in-memory window
If window.length > windowSize, the oldest messages above the threshold are evicted
Evicted messages get passed to summarizeMessages() along with the existing running summary
The merged summary is saved back to the store under compact:{id}:summary

The next getMessages() call returns:

[
  { role: 'system', content: '<running summary>' },
  ...recent messages within the window
]

The summary message uses role system by default. Configure with the summaryRole option if your provider needs a different placement.

Persistence

Without a store, the running summary is lost on process restart. For a chat app, pass a CruxStore:

import { cruxConvexStore } from '@crux/convex'

const window = createSlidingWindow({
  id: `chat-${threadId}`,    // per-thread isolation
  store: cruxConvexStore({ component: components.crux, ctx }),
  windowSize: 20,
  generate: generateTextFn,
  model: cheapModel,
})

The id becomes the store key prefix — different threads must use different ids.

Hooks

const window = createSlidingWindow({
  // ...
  onCompactStart: ({ evictedCount, totalTokens }) => log.info('compacting'),
  onCompactEnd: ({ summaryTokens, ratio }) => metrics.gauge('summary.ratio', ratio),
})

onCompactEnd fires whenever a summarization round completes. It carries tokensBefore, tokensAfter, and ratio — useful for telemetry on how aggressive compaction is being.

Sliding window

What problem does this solve?

When should I use it?

When should I NOT use it?

Quick start

How eviction works

Persistence

Hooks

Cost considerations

Where to next

Compaction overview

Budget manager

Extract key facts

On this page