Crux
GuidesContexts

Token Management

Token-aware rendering and automatic context dropping under budget pressure.

When the assembled system message exceeds a token budget, the lowest-priority contexts are dropped first. Provider-specific adaptations let you tweak prompts per model without breaking portability.

generate.ts
import { prompt, context } from '@crux/core'
import { generate } from '@crux/ai'

const critical = context({ priority: 100, system: '## Critical Rules' })  // never dropped
const guidelines = context({ priority: 50, system: '## Guidelines\n...' })
const examples = context({ priority: 20, system: '## Examples\n...' })    // dropped first

const myPrompt = prompt({
  use: [critical, examples, guidelines],
  system: 'You are an assistant.',
})

const result = await generate(myPrompt, {
  model,
  input: { ... },
  tokenBudget: 2000,
})

Use .inspect() to see exactly which contexts were kept, dropped, and why — see Type System for the full inspection API.

How it works

  1. Conditional contexts are evaluated firstwhen predicates, when() wrappers, and match() specs are resolved. Excluded contexts are never even resolved.
  2. The prompt's own system text is always included
  3. Active context contributions are sorted by priority (lowest first for dropping)
  4. Lowest-priority contexts are dropped until the total fits within budget
  5. Use .inspect() to see exactly what was dropped, excluded, and why

Conditional exclusion vs token-budget dropping

These are two different mechanisms with different semantics:

Conditional exclusion (when/match)Token-budget dropping
When evaluatedBefore resolutionAfter resolution
systemFn called?NoYes (then dropped)
Tools contributed?NoYes (tools always survive drops)
Inspect statusexcludedContexts[]droppedContexts[]
Use caseInput-dependent compositionToken pressure management

Use when/match when a context is irrelevant to the current request (wrong mode, missing data, feature disabled). Use priority for graceful degradation under token pressure.

// Conditional: don't include research context if there's no research data
const research = context({
  input: z.object({ synthesis: z.string().optional() }),
  when: ({ input }) => !!input.synthesis, // excluded when no data
  priority: 40, // AND droppable under pressure
  system: ({ input }) => `## Research\n${input.synthesis}`,
})

Custom tokenizer

The default tokenizer estimates tokens as chars / 4. For accurate counts, provide a real tokenizer via config():

config({
  generation: {
    tokenizer: (text) => encode(text).length,
  },
})

Or standalone: setTokenizer((text) => encode(text).length)

Context Caching

Contexts with expensive async resolvers can be cached with a single cache option — skipping redundant resolver calls and enabling provider-level token caching (90% discount on Anthropic, 50% on OpenAI).

const brand = context({
  id: 'brand-voice',
  system: async ({ input }) => fetchBrandProfile(input.orgId),
  cache: 300_000, // 5min TTL + provider caching
})

See the Context Caching guide for full details on cache option forms, provider behavior, and observability.

Provider-specific adaptations

Different models sometimes need different prompting strategies. The adapt field lets you apply provider-specific tweaks:

prompts.ts
const myPrompt = prompt({
  system: 'You are a helpful assistant.',
  adapt: {
    anthropic: {
      appendSystem: '\nReturn raw JSON, no markdown fences.',
    },
    openai: {
      prependPrompt: 'Think step by step.\n\n',
      settings: { temperature: 0.1 },
    },
    '*': { appendSystem: '\nRespond with valid JSON only.' },
  },
})

Resolution priority: exact provider match > modelId prefix (for OpenRouter-style routing) > '*' wildcard.

adapt tweaks are applied during .resolve(), so .inspect() output reflects them. Test with inspect({ provider: 'anthropic' }) to verify provider-specific changes.

Settings merge with priority: config.settings < adapt.settings < call-site overrides.

Next steps

On this page