Same prompt. Same result. Every time.
AI failures usually hide in what gets sent to the model: stale context, missing memory, unsafe inputs, silent fallbacks, weak tests. Crux gives you typed building blocks for those pieces around the SDK you already use, so you can see what the model saw and fix the right layer.
context()policy · priority · budget
memory()recent · facts · episodes
retriever()index → embed → rerank
guardrail()pii · injection · safety
prompt({ use: [ ... ] })One place for everything the model is allowed to see.
constrain()zod · retry with feedback
generate()your SDK · your model
evaluate()suites · baselines
observe()traces · devtools · OTel
Why Crux
Bad LLM output is rarely a model problem.
The fix usually isn't the prompt and isn't the model. It's the missing memory, stale retrieval, dropped instruction, or test that should have caught the regression. Crux makes those parts explicit.
The full case for Crux→Steerability
Guardrails, constraints, and fallbacks are declared before the call.
Composable context
Brand voice, memory, and retrieval stay reusable instead of pasted together.
Type safety
Zod schemas in. Typed objects out. Refactors stay real.
Observable by default
See what the model saw before you start guessing.
Modular by default
Opt in, never locked in.
Start with one typed building block, then add more only when your AI feature needs them. Prompts, context, memory, retrieval, guardrails, routing, tests, and traces stay modular, but they work together so you can see what the model saw and why.
Use one block first.
Replace a prompt string, add memory, or wrap retrieval. Each piece is its own import, and the rest stays out of your bundle.
Bring in the next fix when you need it.
Add safety, routing, tests, or traces when that part becomes the problem. You do not have to migrate into a framework first.
Keep the model input visible.
As pieces accumulate, Crux keeps the call understandable instead of turning your AI stack into a mystery box.
Composition
One array. Every block plugs in.
Memory, retrieval, guardrails. One prompt. A Crux prompt has a single use: array. Drop any combination of blocks into it; they add context, tools, and checks without scattering logic across the app. The SDK still makes the call.
ResolveBlocks become the prompt the model actually sees.AdaptTranslated to the provider or runner you point at.ValidateOutput is checked against the declared schema.ObserveTraces show what happened when the call ran.import { prompt } from '@use-crux/core'import { memory, recentMessages, facts } from '@use-crux/core/memory'import { retriever } from '@use-crux/core/retrieval'import { guardrail } from '@use-crux/core/safety'import { generate } from '@use-crux/ai'const chat = memory({store,blocks: [recentMessages(), facts({ id: 'about-user' })],})const docs = retriever({ store, query: q => q.message })const pii = guardrail({name: 'pii', phase: 'input',validate: detectPII,})const reply = prompt({use: [chat, docs, pii],input: z.object({ message: z.string() }),output: z.object({ answer: z.string() }),system: 'Answer using memory and retrieved docs.',})const result = await generate(reply, {model: openai('gpt-4o'),input: { message: 'What did we agree on?' },})// result.object.answer — typed, traced, safe
SDK-agnostic by default
Bring your SDK. Keep your stack.
Define your prompt once with typed schemas. The call site decides who answers: your provider SDK, your in-house client, or your agent framework. Crux composes around it instead of replacing it.
prompt()
Typed, SDK-agnostic definition
@use-crux/ai
Vercel AI SDK
@use-crux/openai
OpenAI SDK
@use-crux/anthropic
Anthropic SDK
@use-crux/google
Google GenAI
@use-crux/core/ai-agent
Agent frameworks
Composition
When one call isn’t enough.
The same building blocks scale up when one model call becomes a workflow: sequential steps, parallel work, voting, handoffs, and routing. Add the pattern you need without giving up your SDK.
parallel()fan out, gather
pipeline()sequential, typed handoffs
consensus()vote with quorum
swarm()peer-to-peer routing
blackboard()shared typed state
handoff()schema-validated transfer
delegate()agent as callable tool
Observability
See why the answer happened.
When an answer is wrong, you need more than the final text. Crux shows the prompt, context, memory, retrieval, tools, safety checks, cost, and traces behind the call, locally in devtools or in your production telemetry.
Visual devtools
Live trace timeline, resolved system preview, memory ops, Quality rolling averages. Web UI and terminal dashboard for the same data.
OpenTelemetry
Send spans to Datadog, Honeycomb, Grafana, or any OTel-compatible platform. Works in Lambda, Convex, and Cloudflare Workers.
Both built on the plugin system. Composable, zero-overhead when disabled, extensible with custom plugins.
Evaluation
Test the setup, not just the answer.
Catch regressions before users do. Put expected cases next to the prompt, run them in CI, compare baselines, and test the setup around the model instead of only grading the final text.
export default evaluate('reply.quality', {task: reply,data: [{ name: 'remembers facts', input: { message: 'What did we agree on?' } },{ name: 'refuses off-topic', input: { message: 'Tell me a joke' } },],expect: (ctx) => {ctx.expect(ctx.output.answer).toContain('demo')},scorers: [scorers.judge({ name: 'support_fit', rubric })],gates: { support_fit: { min: 0.8 } },})
Built-in judges
faithfulnessDoes the answer stay grounded in the retrieved context?relevanceIs the model addressing what the user actually asked?safetyPre-built checks for refusal, leakage, off-topic drift.Runs across
The trade
What you put down. What you pick up.
Crux doesn't replace your SDK. It sits alongside, organizes the pieces around the call, validates the result, shows what happened, and gets out of the way.
Start with one block.
Add the rest when you need them. Bring your SDK. No runtime to adopt. Nothing to migrate away from.