Budget manager
Advisory token-pressure tracker. Decide when to compact; apply your own strategy.
A budget manager tracks token usage across multiple sources (system, history, tools, retrieved context) and reports a pressure level: normal, warning, or critical. It does no compaction itself — it tells you when to act, not how.
What problem does this solve?
Different parts of a request consume tokens at different rates: a long-running chat history grows linearly, retrieved RAG context arrives in chunks, tools come and go between turns. To compact intelligently you need to know what is taking up the budget, not just that you're close to the limit.
The budget manager gives you a pure tracker — no LLM calls, no state, no side effects. You report token counts as you build the request; you read pressure level when you need to decide what to drop or compress.
When should I use it?
- You want fine-grained pressure measurement by source (system / history / tools / context)
- You're combining compaction strategies (extract structured first, then summarize rest)
- You need a hook for telemetry — emit a metric every time pressure crosses a threshold
- You want to make compaction decisions in your code, not via a built-in primitive
When should I NOT use it?
- You're happy with sliding window's "compact when N is exceeded" — it doesn't need pressure tracking
- You don't have a token counter — without one, the budget manager has no inputs
- You only have one source of tokens — you can compute pressure inline without the abstraction
Quick start
import { createBudgetManager } from '@crux/core/compaction'
const budget = createBudgetManager({
limit: 128_000,
warningThreshold: 0.8, // 80% used → warning
criticalThreshold: 0.95, // 95% used → critical
})
// Report token counts as you build the request
budget.report('system', 2000)
budget.report('history', 45_000)
budget.report('tools', 3000)
budget.report('rag', 12_000)
const state = budget.check()
state.used // 62000
state.available // 66000
state.pressure // 0.484
state.level // 'normal'
state.breakdown // { system: 2000, history: 45000, tools: 3000, rag: 12000 }Reacting to pressure
The typical pattern: build the request, measure, decide, compact:
budget.report('system', countTokens(systemText))
budget.report('history', countTokens(messages))
budget.report('rag', countTokens(retrievedDocs))
const state = budget.check()
if (state.level === 'critical') {
// Aggressive: extract structured facts + sliding-window
await extractAndStoreFacts(messages)
messages = await slidingWindow.getMessages()
} else if (state.level === 'warning') {
// Soft: trim retrieved docs to top-K
retrievedDocs = retrievedDocs.slice(0, 3)
}Hooks
onBudgetCheck fires every time pressure level changes — handy for metrics:
const budget = createBudgetManager({
limit: 128_000,
onBudgetCheck: ({ level, pressure, breakdown }) => {
if (level === 'critical') {
metrics.increment('budget.critical', { breakdown })
}
},
})The hook also flows through to devtools and OTel via the standard instrumentation pipeline.
Choosing thresholds
warningThreshold: 0.8is a reasonable default — gives you headroom for the response itselfcriticalThreshold: 0.95leaves ~5% for safety; below this and you risk a 4xx from the provider- For models with reserved-output budgets, subtract the response budget from
limitbefore passing in