Sliding window
Stateful auto-rolling compression. Keep recent messages verbatim, summarize older ones.
A sliding window keeps the most recent N messages verbatim and automatically summarizes older messages when the window overflows. The summary lives in a CruxStore keyed by the window's id, so the running summary survives across requests.
What problem does this solve?
Long-running chat loops eventually exceed the context window. Dropping oldest messages loses information. Summarizing the entire history on every turn is expensive and noisy. The sliding window is the middle ground: recent messages stay full-fidelity (the model sees the verbatim text), older messages get folded into a compact running summary that stays cheap to include.
When should I use it?
- You're building a chat or agent loop with rolling history
- You want compaction to happen automatically when the window overflows
- A running summary of older turns is acceptable as the representation
When should I NOT use it?
- You only need the last N messages and don't care about older ones — keep it simple, no summarizer needed
- You need structured data out of older messages — use extract key facts instead, possibly before sliding-window evicts them
- You're processing a bounded batch of messages — use summarize messages one-shot
Quick start
import { generateTextFn } from '@crux/ai'
import { createSlidingWindow } from '@crux/core/compaction'
const window = createSlidingWindow({
id: 'chat-1', // store key — same id reuses the running summary
windowSize: 20, // keep last 20 messages verbatim
generate: generateTextFn,
model: cheapModel, // a fast/cheap model is fine for summarization
summaryBudget: 1000, // max tokens for the running summary
store: myStore, // optional persistent store
})
await window.push({ role: 'user', content: 'Hello!' })
await window.push({ role: 'assistant', content: 'Hi there!' })
// On every turn, get the compacted message array (summary + recent messages)
const messages = await window.getMessages()
window.getStats()
// { totalMessages: 25, windowedMessages: 20, summaryTokens: 450, evictions: 1 }Use a cheap, fast model for generate here — summaries don't need your best model. gpt-4o-mini, claude-haiku, or gemini-flash keeps compaction fast and inexpensive.
How eviction works
On every push():
- The new message gets appended to the in-memory window
- If
window.length > windowSize, the oldest messages above the threshold are evicted - Evicted messages get passed to
summarizeMessages()along with the existing running summary - The merged summary is saved back to the store under
compact:{id}:summary
The next getMessages() call returns:
[
{ role: 'system', content: '<running summary>' },
...recent messages within the window
]The summary message uses role system by default. Configure with the summaryRole option if your provider needs a different placement.
Persistence
Without a store, the running summary is lost on process restart. For a chat app, pass a CruxStore:
import { cruxConvexStore } from '@crux/convex'
const window = createSlidingWindow({
id: `chat-${threadId}`, // per-thread isolation
store: cruxConvexStore({ component: components.crux, ctx }),
windowSize: 20,
generate: generateTextFn,
model: cheapModel,
})The id becomes the store key prefix — different threads must use different ids.
Hooks
const window = createSlidingWindow({
// ...
onCompactStart: ({ evictedCount, totalTokens }) => log.info('compacting'),
onCompactEnd: ({ summaryTokens, ratio }) => metrics.gauge('summary.ratio', ratio),
})onCompactEnd fires whenever a summarization round completes. It carries tokensBefore, tokensAfter, and ratio — useful for telemetry on how aggressive compaction is being.
Cost considerations
Sliding window calls the LLM only when the window overflows. With windowSize: 20, summarization runs roughly every Nth turn (depending on message length). Each summarization call replaces the previous summary — there's no log-O(N) overhead.
For a typical chat with 5–10 sentence turns, expect summarization to fire every 15–25 turns. The cost stays bounded by summaryBudget.