Crux
GuidesRouting & Fallback

Cascade

Try cheaper models first and escalate when quality checks reject the result.

cascade() is for quality-based escalation. It runs tiers in order. Each tier can inspect the generated result and either accept it or move to the next tier.

Use a cascade when the same task can often be handled by a cheaper model, but some inputs need a stronger model.

answer-cascade.ts
import { cascade } from '@crux/core/routing'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'

export const answerCascade = cascade({
  id: 'answer-quality-cascade',
  description: 'Escalate support answers until evidence and completeness are good enough.',
  tiers: [
    {
      model: openai('gpt-4o-mini'),
      budget: 0.75,
      evaluate: (result) => {
        const answer = result as { object?: { confidence?: number; citations?: unknown[] } }
        const confidence = answer.object?.confidence ?? 0
        const hasCitations = (answer.object?.citations?.length ?? 0) > 0

        return {
          accepted: confidence >= 0.75 && hasCitations,
          confidence,
          budget: 0.75,
          note: hasCitations ? undefined : 'missing citations',
        }
      },
    },
    {
      model: anthropic('claude-sonnet-4-20250514'),
      budget: 0.85,
      evaluate: (result) => {
        const answer = result as { object?: { confidence?: number } }
        const confidence = answer.object?.confidence ?? 0

        return {
          accepted: confidence >= 0.85,
          confidence,
          budget: 0.85,
        }
      },
    },
    {
      model: anthropic('claude-opus-4-20250514'),
    },
  ],
  budget: {
    maxCost: 0.05,
    maxLatencyMs: 8_000,
  },
})

The final tier usually has no evaluate, which means "accept whatever this model returns." If every tier has evaluate and every evaluator rejects, Crux throws CascadeExhaustedError.

Evaluation Results

An evaluator can return a boolean or a structured result.

type CascadeEvaluation =
  | boolean
  | {
      accepted: boolean
      confidence?: number
      budget?: number
      note?: string
    }

Prefer structured results in production. They make devtools and trace reports explain why a tier was rejected.

evaluate: (result, context) => ({
  accepted: score(result) >= 0.8,
  confidence: score(result),
  budget: 0.8,
  note: `tier ${context.tierIndex} cost ${context.cost ?? 'unknown'}`,
})

The evaluation context includes:

FieldMeaning
modelThe selected model id.
costTier cost, when the provider reports it.
tierIndexZero-based tier index.
totalCostCumulative cost across attempted tiers.

Budgets

Cascade budgets are best-effort.

BudgetBehavior
maxCostChecked only when the provider returns cost metadata. OpenRouter exposes cost; some direct SDKs do not.
maxLatencyMsChecked against wall-clock time across all attempted tiers.

If a budget is exceeded after a tier runs, Crux returns the last result and sets _meta.cascade.budgetExceeded = true. It does not throw just because the budget was exceeded.

Error Handling

Cascade does not catch provider errors. It only handles quality rejections from evaluate.

Wrap tier models in fallback() when you also need rate-limit, timeout, or outage resilience.

const productionCascade = cascade({
  id: 'production-answer-cascade',
  tiers: [
    { model: fallback(gpt4oMini, claudeHaiku), evaluate: fastQualityCheck },
    { model: fallback(claudeSonnet, gpt4o), evaluate: strongQualityCheck },
    { model: claudeOpus },
  ],
})

Exhaustion

If all tiers reject, Crux throws CascadeExhaustedError with the last result and tier details attached.

import { CascadeExhaustedError } from '@crux/core/routing'

try {
  await generate(prompt, { model: strictCascade, input })
} catch (error) {
  if (error instanceof CascadeExhaustedError) {
    console.log(error.lastResult)
    console.log(error.tierDetails)
  }
}

Metadata

Cascade metadata is attached to result._meta.cascade. It includes attempted tier count, accepted tier, budget status, and every configured tier in order. Skipped tiers are included with status: 'skipped' and note: 'not reached'.

result._meta.cascade
// {
//   tiersAttempted: 2,
//   totalTiers: 3,
//   acceptedAtTier: 1,
//   budgetExceeded: false,
//   tiers: [...]
// }

Avoid

  • Do not use cascade for provider errors. Use fallback().
  • Do not put a tier without evaluate before later tiers; that tier always accepts.
  • Do not use cascade with stream(). Cascade works with generate() because it needs the completed result.

On this page