Token Management
Token-aware rendering and automatic context dropping under budget pressure.
When the assembled system message exceeds a token budget, the lowest-priority contexts are dropped first. Provider-specific adaptations let you tweak prompts per model without breaking portability.
import { prompt, context } from '@crux/core'
import { generate } from '@crux/ai'
const critical = context({ priority: 100, system: '## Critical Rules' }) // never dropped
const guidelines = context({ priority: 50, system: '## Guidelines\n...' })
const examples = context({ priority: 20, system: '## Examples\n...' }) // dropped first
const myPrompt = prompt({
use: [critical, examples, guidelines],
system: 'You are an assistant.',
})
const result = await generate(myPrompt, {
model,
input: { ... },
tokenBudget: 2000,
})Use .inspect() to see exactly which contexts were kept, dropped, and why — see Type
System for the full inspection API.
How it works
- Conditional contexts are evaluated first —
whenpredicates,when()wrappers, andmatch()specs are resolved. Excluded contexts are never even resolved. - The prompt's own system text is always included
- Active context contributions are sorted by priority (lowest first for dropping)
- Lowest-priority contexts are dropped until the total fits within budget
- Use
.inspect()to see exactly what was dropped, excluded, and why
Conditional exclusion vs token-budget dropping
These are two different mechanisms with different semantics:
Conditional exclusion (when/match) | Token-budget dropping | |
|---|---|---|
| When evaluated | Before resolution | After resolution |
| systemFn called? | No | Yes (then dropped) |
| Tools contributed? | No | Yes (tools always survive drops) |
| Inspect status | excludedContexts[] | droppedContexts[] |
| Use case | Input-dependent composition | Token pressure management |
Use when/match when a context is irrelevant to the current request (wrong mode, missing data, feature disabled). Use priority for graceful degradation under token pressure.
// Conditional: don't include research context if there's no research data
const research = context({
input: z.object({ synthesis: z.string().optional() }),
when: ({ input }) => !!input.synthesis, // excluded when no data
priority: 40, // AND droppable under pressure
system: ({ input }) => `## Research\n${input.synthesis}`,
})Custom tokenizer
The default tokenizer estimates tokens as chars / 4. For accurate counts, provide a real tokenizer via config():
config({
generation: {
tokenizer: (text) => encode(text).length,
},
})Or standalone: setTokenizer((text) => encode(text).length)
Context Caching
Contexts with expensive async resolvers can be cached with a single cache option — skipping redundant resolver calls and enabling provider-level token caching (90% discount on Anthropic, 50% on OpenAI).
const brand = context({
id: 'brand-voice',
system: async ({ input }) => fetchBrandProfile(input.orgId),
cache: 300_000, // 5min TTL + provider caching
})See the Context Caching guide for full details on cache option forms, provider behavior, and observability.
Provider-specific adaptations
Different models sometimes need different prompting strategies. The adapt field lets you apply provider-specific tweaks:
const myPrompt = prompt({
system: 'You are a helpful assistant.',
adapt: {
anthropic: {
appendSystem: '\nReturn raw JSON, no markdown fences.',
},
openai: {
prependPrompt: 'Think step by step.\n\n',
settings: { temperature: 0.1 },
},
'*': { appendSystem: '\nRespond with valid JSON only.' },
},
})Resolution priority: exact provider match > modelId prefix (for OpenRouter-style routing) > '*' wildcard.
adapt tweaks are applied during .resolve(), so .inspect() output reflects them. Test with inspect({ provider: 'anthropic' }) to verify provider-specific changes.
Settings merge with priority: config.settings < adapt.settings < call-site overrides.