Token Management

When the assembled system message exceeds a token budget, the lowest-priority contexts are dropped first. Provider-specific adaptations let you tweak prompts per model without breaking portability.

generate.ts

import { prompt, context } from '@crux/core'
import { generate } from '@crux/ai'

const critical = context({ priority: 100, system: '## Critical Rules' })  // never dropped
const guidelines = context({ priority: 50, system: '## Guidelines\n...' })
const examples = context({ priority: 20, system: '## Examples\n...' })    // dropped first

const myPrompt = prompt({
  use: [critical, examples, guidelines],
  system: 'You are an assistant.',
})

const result = await generate(myPrompt, {
  model,
  input: { ... },
  tokenBudget: 2000,
})

Use .inspect() to see exactly which contexts were kept, dropped, and why — see Type System for the full inspection API.

How it works

Conditional contexts are evaluated first — when predicates, when() wrappers, and match() specs are resolved. Excluded contexts are never even resolved.
The prompt's own system text is always included
Active context contributions are sorted by priority (lowest first for dropping)
Lowest-priority contexts are dropped until the total fits within budget
Use .inspect() to see exactly what was dropped, excluded, and why

Conditional exclusion vs token-budget dropping

These are two different mechanisms with different semantics:

	Conditional exclusion (`when`/`match`)	Token-budget dropping
When evaluated	Before resolution	After resolution
systemFn called?	No	Yes (then dropped)
Tools contributed?	No	Yes (tools always survive drops)
Inspect status	`excludedContexts[]`	`droppedContexts[]`
Use case	Input-dependent composition	Token pressure management

Use when/match when a context is irrelevant to the current request (wrong mode, missing data, feature disabled). Use priority for graceful degradation under token pressure.

// Conditional: don't include research context if there's no research data
const research = context({
  input: z.object({ synthesis: z.string().optional() }),
  when: ({ input }) => !!input.synthesis, // excluded when no data
  priority: 40, // AND droppable under pressure
  system: ({ input }) => `## Research\n${input.synthesis}`,
})

Custom tokenizer

The default tokenizer estimates tokens as chars / 4. For accurate counts, provide a real tokenizer via config():

config({
  generation: {
    tokenizer: (text) => encode(text).length,
  },
})

Or standalone: setTokenizer((text) => encode(text).length)

Contexts with expensive async resolvers can be cached with a single cache option — skipping redundant resolver calls and enabling provider-level token caching (90% discount on Anthropic, 50% on OpenAI).

const brand = context({
  id: 'brand-voice',
  system: async ({ input }) => fetchBrandProfile(input.orgId),
  cache: 300_000, // 5min TTL + provider caching
})

See the Context Caching guide for full details on cache option forms, provider behavior, and observability.

Provider-specific adaptations

Different models sometimes need different prompting strategies. The adapt field lets you apply provider-specific tweaks:

prompts.ts

const myPrompt = prompt({
  system: 'You are a helpful assistant.',
  adapt: {
    anthropic: {
      appendSystem: '\nReturn raw JSON, no markdown fences.',
    },
    openai: {
      prependPrompt: 'Think step by step.\n\n',
      settings: { temperature: 0.1 },
    },
    '*': { appendSystem: '\nRespond with valid JSON only.' },
  },
})

Resolution priority: exact provider match > modelId prefix (for OpenRouter-style routing) > '*' wildcard.

adapt tweaks are applied during .resolve(), so .inspect() output reflects them. Test with inspect({ provider: 'anthropic' }) to verify provider-specific changes.

Settings merge with priority: config.settings < adapt.settings < call-site overrides.

Token Management

How it works

Conditional exclusion vs token-budget dropping

Custom tokenizer

Context Caching

Provider-specific adaptations

Next steps

Middleware & Hooks

Compaction

Core Concepts

API Reference

On this page