Cascade
Try cheaper models first and escalate when quality checks reject the result.
cascade() is for quality-based escalation. It runs tiers in order. Each tier can inspect the generated result and either accept it or move to the next tier.
Use a cascade when the same task can often be handled by a cheaper model, but some inputs need a stronger model.
import { cascade } from '@crux/core/routing'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'
export const answerCascade = cascade({
id: 'answer-quality-cascade',
description: 'Escalate support answers until evidence and completeness are good enough.',
tiers: [
{
model: openai('gpt-4o-mini'),
budget: 0.75,
evaluate: (result) => {
const answer = result as { object?: { confidence?: number; citations?: unknown[] } }
const confidence = answer.object?.confidence ?? 0
const hasCitations = (answer.object?.citations?.length ?? 0) > 0
return {
accepted: confidence >= 0.75 && hasCitations,
confidence,
budget: 0.75,
note: hasCitations ? undefined : 'missing citations',
}
},
},
{
model: anthropic('claude-sonnet-4-20250514'),
budget: 0.85,
evaluate: (result) => {
const answer = result as { object?: { confidence?: number } }
const confidence = answer.object?.confidence ?? 0
return {
accepted: confidence >= 0.85,
confidence,
budget: 0.85,
}
},
},
{
model: anthropic('claude-opus-4-20250514'),
},
],
budget: {
maxCost: 0.05,
maxLatencyMs: 8_000,
},
})The final tier usually has no evaluate, which means "accept whatever this model returns." If every tier has evaluate and every evaluator rejects, Crux throws CascadeExhaustedError.
Evaluation Results
An evaluator can return a boolean or a structured result.
type CascadeEvaluation =
| boolean
| {
accepted: boolean
confidence?: number
budget?: number
note?: string
}Prefer structured results in production. They make devtools and trace reports explain why a tier was rejected.
evaluate: (result, context) => ({
accepted: score(result) >= 0.8,
confidence: score(result),
budget: 0.8,
note: `tier ${context.tierIndex} cost ${context.cost ?? 'unknown'}`,
})The evaluation context includes:
| Field | Meaning |
|---|---|
model | The selected model id. |
cost | Tier cost, when the provider reports it. |
tierIndex | Zero-based tier index. |
totalCost | Cumulative cost across attempted tiers. |
Budgets
Cascade budgets are best-effort.
| Budget | Behavior |
|---|---|
maxCost | Checked only when the provider returns cost metadata. OpenRouter exposes cost; some direct SDKs do not. |
maxLatencyMs | Checked against wall-clock time across all attempted tiers. |
If a budget is exceeded after a tier runs, Crux returns the last result and sets _meta.cascade.budgetExceeded = true. It does not throw just because the budget was exceeded.
Error Handling
Cascade does not catch provider errors. It only handles quality rejections from evaluate.
Wrap tier models in fallback() when you also need rate-limit, timeout, or outage resilience.
const productionCascade = cascade({
id: 'production-answer-cascade',
tiers: [
{ model: fallback(gpt4oMini, claudeHaiku), evaluate: fastQualityCheck },
{ model: fallback(claudeSonnet, gpt4o), evaluate: strongQualityCheck },
{ model: claudeOpus },
],
})Exhaustion
If all tiers reject, Crux throws CascadeExhaustedError with the last result and tier details attached.
import { CascadeExhaustedError } from '@crux/core/routing'
try {
await generate(prompt, { model: strictCascade, input })
} catch (error) {
if (error instanceof CascadeExhaustedError) {
console.log(error.lastResult)
console.log(error.tierDetails)
}
}Metadata
Cascade metadata is attached to result._meta.cascade. It includes attempted tier count, accepted tier, budget status, and every configured tier in order. Skipped tiers are included with status: 'skipped' and note: 'not reached'.
result._meta.cascade
// {
// tiersAttempted: 2,
// totalTiers: 3,
// acceptedAtTier: 1,
// budgetExceeded: false,
// tiers: [...]
// }Avoid
- Do not use cascade for provider errors. Use
fallback(). - Do not put a tier without
evaluatebefore later tiers; that tier always accepts. - Do not use cascade with
stream(). Cascade works withgenerate()because it needs the completed result.