Context Caching

Cache expensive context resolvers and leverage LLM provider token caching for cost savings.

Context resolvers that call databases, RAG systems, or APIs run on every prompt.resolve() call. The cache option on context() solves two problems at once:

Application-level caching — skip the resolver function when the same inputs produce the same output
Provider-level token caching — tell Anthropic to cache this content block (90% token discount) or benefit from OpenAI's automatic prefix caching (50% discount)

Quick Start

contexts.ts

import { context } from '@crux/core'
import { z } from 'zod'

const brand = context({
  id: 'brand-voice',
  input: z.object({ orgId: z.string() }),
  system: async ({ input }) => {
    const data = await fetchBrandProfile(input.orgId) // expensive DB call
    return `## Brand Voice\n${data.guidelines}`
  },
  cache: 300_000, // 5 minutes — enables both caching layers
})

That's it. The first call runs the resolver and caches the result. Subsequent calls with the same orgId within 5 minutes skip the resolver entirely. The Anthropic adapter also marks this block with cache_control: { type: 'ephemeral' } for provider-level token caching.

Cache Option Forms

Form	Resolver TTL	Provider Cache
`cache: 300_000`	300s	Yes
`cache: true`	300s (5min default)	Yes
`cache: { ttl: 60_000 }`	60s	Yes
`cache: { ttl: 60_000, providerCache: false }`	60s	No
`cache: { providerCache: true }`	None	Yes
Not set / `false`	None	No

The 5-minute default TTL matches Anthropic's cache window. Anthropic extends the cache on every hit, so a frequently-used context stays cached indefinitely.

Fine-Grained Control

Use the object form when the two caching layers need different settings:

contexts.ts

// Cache the resolver but DON'T mark as a provider cache breakpoint
// (content varies too much for prefix caching)
const searchResults = context({
  id: 'search',
  input: z.object({ query: z.string() }),
  system: async ({ input }) => ragSearch(input.query),
  cache: { ttl: 30_000, providerCache: false },
})

// Provider cache only — cheap to compute but stable across calls
const rules = context({
  id: 'rules',
  system: '## Rules\nAlways respond in JSON.',
  cache: { providerCache: true },
})

How It Works

Application-Level (Resolver Caching)

When cacheTtl > 0:

A cache key is computed from contextId + a stable hash of the input fields declared in the context's inputSchema
On cache hit: the cached text is returned, systemFn() is never called
On cache miss: systemFn() runs, the result is stored with TTL
Unrelated prompt-level input fields don't affect the cache key

Provider-Level (Token Caching)

When providerCache: true:

The resolution pipeline includes systemBlocks on ResolvedPrompt — one block per context, each with a providerCache flag
@crux/anthropic converts blocks to Anthropic TextBlockParam[] with cache_control: { type: 'ephemeral' } (up to 4 breakpoints)
@crux/google creates server-side CachedContent objects for the cacheable system prefix via Google's caching API, then references them in generateContent() calls while sending any uncached remainder as systemInstruction. Handles creation, reuse, per-call TTL, concurrency dedup, and graceful error fallback automatically.
@crux/ai (Vercel AI SDK) converts blocks to SystemModelMessage[] with providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } } when the model is Anthropic
OpenAI prefix caching works automatically — Crux's deterministic context ordering already provides stable prefixes

Requirements

id is required when cacheTtl > 0 — needed for cache key derivation. An error is thrown at definition time if missing.
Static string contexts silently skip TTL caching (nothing to cache). providerCache still applies — static content is ideal for provider prefix caching.

Observability

Cache events flow through the standard instrumentation pipeline:

Instrumentation hooks: onContextCacheHit and onContextCacheMiss fire during the compiled resolution pass with contextId, cache key, and age/resolution timing
Devtools: Cache hit/miss events appear in the trace timeline and the dashboard stats panel shows hit count, miss count, and hit rate
CLI TUI: The stats panel displays "Context Cache" with hit/miss counts and hit rate percentage
OTel: Cache events create crux.context.cache spans with status (hit/miss) and age_ms attributes

Context Caching

Quick Start

Cache Option Forms

Fine-Grained Control

How It Works

Application-Level (Resolver Caching)

Provider-Level (Token Caching)

Requirements

Observability

Next Steps

Token Management

Best Practices

Compaction

On this page