Harness engineering toolkit

Same prompt. Same result. Every time.

AI failures usually hide in what gets sent to the model: stale context, missing memory, unsafe inputs, silent fallbacks, weak tests. Crux gives you typed building blocks for those pieces around the SDK you already use, so you can see what the model saw and fix the right layer.

Get Started Read the Docs

$npm install @use-crux/core

context()

policy · priority · budget

memory()

recent · facts · episodes

retriever()

index → embed → rerank

guardrail()

pii · injection · safety

prompt({ use: [ ... ] })

One place for everything the model is allowed to see.

constrain()

zod · retry with feedback

generate()

your SDK · your model

evaluate()

suites · baselines

observe()

traces · devtools · OTel

Add one pieceSee the whole call

Why Crux

Bad LLM output is rarely a model problem.

The fix usually isn't the prompt and isn't the model. It's the missing memory, stale retrieval, dropped instruction, or test that should have caught the regression. Crux makes those parts explicit.

The full case for Crux→

Steerability

Guardrails, constraints, and fallbacks are declared before the call.

Composable context

Brand voice, memory, and retrieval stay reusable instead of pasted together.

Type safety

Zod schemas in. Typed objects out. Refactors stay real.

Observable by default

See what the model saw before you start guessing.

Modular by default

Opt in, never locked in.

Start with one typed building block, then add more only when your AI feature needs them. Prompts, context, memory, retrieval, guardrails, routing, tests, and traces stay modular, but they work together so you can see what the model saw and why.

START SMALL

Use one block first.

Replace a prompt string, add memory, or wrap retrieval. Each piece is its own import, and the rest stays out of your bundle.

ADD WHAT HURTS

Bring in the next fix when you need it.

Add safety, routing, tests, or traces when that part becomes the problem. You do not have to migrate into a framework first.

SEE THE CALL

Keep the model input visible.

As pieces accumulate, Crux keeps the call understandable instead of turning your AI stack into a mystery box.

Composition

One array. Every block plugs in.

Memory, retrieval, guardrails. One prompt. A Crux prompt has a single use: array. Drop any combination of blocks into it; they add context, tools, and checks without scattering logic across the app. The SDK still makes the call.

ResolveBlocks become the prompt the model actually sees.

AdaptTranslated to the provider or runner you point at.

ValidateOutput is checked against the declared schema.

ObserveTraces show what happened when the call ran.

reply.ts

import { prompt } from '@use-crux/core'
import { memory, recentMessages, facts } from '@use-crux/core/memory'
import { retriever } from '@use-crux/core/retrieval'
import { guardrail } from '@use-crux/core/safety'
import { generate } from '@use-crux/ai'
 
const chat = memory({
  store,
  blocks: [recentMessages(), facts({ id: 'about-user' })],
})
 
const docs = retriever({ store, query: q => q.message })
 
const pii = guardrail({
  name: 'pii', phase: 'input',
  validate: detectPII,
})
 
const reply = prompt({
  use: [chat, docs, pii],
  input:  z.object({ message: z.string() }),
  output: z.object({ answer:  z.string() }),
  system: 'Answer using memory and retrieved docs.',
})
 
const result = await generate(reply, {
  model: openai('gpt-4o'),
  input: { message: 'What did we agree on?' },
})
 
// result.object.answer — typed, traced, safe

SDK-agnostic by default

Bring your SDK. Keep your stack.

Define your prompt once with typed schemas. The call site decides who answers: your provider SDK, your in-house client, or your agent framework. Crux composes around it instead of replacing it.

prompt()

Typed, SDK-agnostic definition

.resolve()

@use-crux/ai

Vercel AI SDK

@use-crux/openai

OpenAI SDK

@use-crux/anthropic

Anthropic SDK

@use-crux/google

Google GenAI

@use-crux/core/ai-agent

Agent frameworks

Composition

When one call isn’t enough.

The same building blocks scale up when one model call becomes a workflow: sequential steps, parallel work, voting, handoffs, and routing. Add the pattern you need without giving up your SDK.

parallel()

fan out, gather

pipeline()

sequential, typed handoffs

consensus()

vote with quorum

swarm()

peer-to-peer routing

blackboard()

shared typed state

handoff()

schema-validated transfer

delegate()

agent as callable tool

All composition patterns

Observability

See why the answer happened.

When an answer is wrong, you need more than the final text. Crux shows the prompt, context, memory, retrieval, tools, safety checks, cost, and traces behind the call, locally in devtools or in your production telemetry.

Development

Visual devtools

Live trace timeline, resolved system preview, memory ops, Quality rolling averages. Web UI and terminal dashboard for the same data.

crux devcrux tracescrux quality

Production

OpenTelemetry

Send spans to Datadog, Honeycomb, Grafana, or any OTel-compatible platform. Works in Lambda, Convex, and Cloudflare Workers.

DatadogHoneycombGrafanaNew Relic

Both built on the plugin system. Composable, zero-overhead when disabled, extensible with custom plugins.

See the devtools in action→

Evaluation

Test the setup, not just the answer.

Catch regressions before users do. Put expected cases next to the prompt, run them in CI, compare baselines, and test the setup around the model instead of only grading the final text.

reply.eval.ts

export default evaluate('reply.quality', {
  task: reply,
  data: [
    { name: 'remembers facts', input: { message: 'What did we agree on?' } },
    { name: 'refuses off-topic', input: { message: 'Tell me a joke' } },
  ],
  expect: (ctx) => {
    ctx.expect(ctx.output.answer).toContain('demo')
  },
  scorers: [scorers.judge({ name: 'support_fit', rubric })],
  gates: { support_fit: { min: 0.8 } },
})

$crux quality run reply.quality

Built-in judges

faithfulnessDoes the answer stay grounded in the retrieved context?

relevanceIs the model addressing what the user actually asked?

safetyPre-built checks for refusal, leakage, off-topic drift.

Runs across

gpt-4oclaude-sonnetgeminigpt-4o-minihaikuyour model

The trade

What you put down. What you pick up.

Crux doesn't replace your SDK. It sits alongside, organizes the pieces around the call, validates the result, shows what happened, and gets out of the way.

Inline prompt strings→A typed prompt() with schemas

Manual string concatenation→Reusable context blocks with priority and budget

Provider-specific call sites→Bring your SDK, keep the structure

console.log of request bodies→A clear view of what the model saw

Output-only evals→Tests for the setup around the answer

A mandatory framework→Small primitives, composed by you

Start with one block.

Add the rest when you need them. Bring your SDK. No runtime to adopt. Nothing to migrate away from.

Get Started Read the Docs

$alpha: public packages pending