Context Caching
Cache expensive context resolvers and leverage LLM provider token caching for cost savings.
Context resolvers that call databases, RAG systems, or APIs run on every prompt.resolve() call. The cache option on context() solves two problems at once:
- Application-level caching — skip the resolver function when the same inputs produce the same output
- Provider-level token caching — tell Anthropic to cache this content block (90% token discount) or benefit from OpenAI's automatic prefix caching (50% discount)
Quick Start
import { context } from '@crux/core'
import { z } from 'zod'
const brand = context({
id: 'brand-voice',
input: z.object({ orgId: z.string() }),
system: async ({ input }) => {
const data = await fetchBrandProfile(input.orgId) // expensive DB call
return `## Brand Voice\n${data.guidelines}`
},
cache: 300_000, // 5 minutes — enables both caching layers
})That's it. The first call runs the resolver and caches the result. Subsequent calls with the same orgId within 5 minutes skip the resolver entirely. The Anthropic adapter also marks this block with cache_control: { type: 'ephemeral' } for provider-level token caching.
Cache Option Forms
| Form | Resolver TTL | Provider Cache |
|---|---|---|
cache: 300_000 | 300s | Yes |
cache: true | 300s (5min default) | Yes |
cache: { ttl: 60_000 } | 60s | Yes |
cache: { ttl: 60_000, providerCache: false } | 60s | No |
cache: { providerCache: true } | None | Yes |
Not set / false | None | No |
The 5-minute default TTL matches Anthropic's cache window. Anthropic extends the cache on every hit, so a frequently-used context stays cached indefinitely.
Fine-Grained Control
Use the object form when the two caching layers need different settings:
// Cache the resolver but DON'T mark as a provider cache breakpoint
// (content varies too much for prefix caching)
const searchResults = context({
id: 'search',
input: z.object({ query: z.string() }),
system: async ({ input }) => ragSearch(input.query),
cache: { ttl: 30_000, providerCache: false },
})
// Provider cache only — cheap to compute but stable across calls
const rules = context({
id: 'rules',
system: '## Rules\nAlways respond in JSON.',
cache: { providerCache: true },
})How It Works
Application-Level (Resolver Caching)
When cacheTtl > 0:
- A cache key is computed from
contextId+ a stable hash of the input fields declared in the context'sinputSchema - On cache hit: the cached text is returned,
systemFn()is never called - On cache miss:
systemFn()runs, the result is stored with TTL - Unrelated prompt-level input fields don't affect the cache key
Provider-Level (Token Caching)
When providerCache: true:
- The resolution pipeline includes
systemBlocksonResolvedPrompt— one block per context, each with aproviderCacheflag @crux/anthropicconverts blocks to AnthropicTextBlockParam[]withcache_control: { type: 'ephemeral' }(up to 4 breakpoints)@crux/googlecreates server-sideCachedContentobjects for the cacheable system prefix via Google's caching API, then references them ingenerateContent()calls while sending any uncached remainder assystemInstruction. Handles creation, reuse, per-call TTL, concurrency dedup, and graceful error fallback automatically.@crux/ai(Vercel AI SDK) converts blocks toSystemModelMessage[]withproviderOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } }when the model is Anthropic- OpenAI prefix caching works automatically — Crux's deterministic context ordering already provides stable prefixes
Requirements
idis required whencacheTtl > 0— needed for cache key derivation. An error is thrown at definition time if missing.- Static string contexts silently skip TTL caching (nothing to cache).
providerCachestill applies — static content is ideal for provider prefix caching.
Observability
Cache events flow through the standard instrumentation pipeline:
- Instrumentation hooks:
onContextCacheHitandonContextCacheMissfire during the compiled resolution pass with contextId, cache key, and age/resolution timing - Devtools: Cache hit/miss events appear in the trace timeline and the dashboard stats panel shows hit count, miss count, and hit rate
- CLI TUI: The stats panel displays "Context Cache" with hit/miss counts and hit rate percentage
- OTel: Cache events create
crux.context.cachespans withstatus(hit/miss) andage_msattributes