One canonical
event graph.
Crux instruments the whole harness around your LLM call. Memory ops, retrieval, guardrails, routing, evals. One structured event stream, rendered locally in devtools and exported as OpenTelemetry in production.
reply00:14triage-swarm00:12draft-edit00:09classify00:08reply00:04summarize00:02reply00:00prompt:replygenerate · gpt-4oOKresolvememory.read · recentmemory.read · factsretriever · embedretriever · searchguardrail · piirouter · cascadegenerate · gpt-4oconstrain · zodmemory.writeobserve.emit<context id="brand" priority=30> Write in a casual tone.
<context id="memory" priority=20> Recent: 4 messages.
<context id="docs" priority=10> Docs available.
Answer using memory and retrieved docs.
{
answer: 'You asked
me to remind you
about the demo.'
}span_id c01e…
Why
Logs are a flashlight in a haystack.
When an LLM feature flakes in production, console.log of the request body isn't enough. You need the resolved system text, the retrieval that ran, the guardrail that fired, the model that answered, the schema that did or didn't parse, end to end, in one place.
Every blockMemory, retrieval, guardrails, routing, generate, eval. All emit the same shape.Zero overhead when disabledPlugins read the event stream. No subscribers, no cost.Same in dev and prodThe local devtools and your OTel exporter are reading the same graph.Trace timeline
Every span. Every retry.
Crux groups spans into before / call / after lanes so a flaky retrieval doesn’t hide behind the model latency. Click any span for resolved inputs, outputs, and parents.
prompt:replytrace_id 4a7f…be0c · 1.21s total · 892 tokens · gpt-4oOKmemory.read · recentmemory.read · factsretriever · embedretriever · searchguardrail · piigenerate · gpt-4oconstrain · zodmemory.writeevaluate · judgeInspect the harness
What was read. What was written. How it scored.
Each block surfaces its own inspector panel. Memory shows the keys that were read and the writes that landed. Eval shows the judges that ran across the matrix.
memory.recentMessagesread · 24ms · 4 itemsmemory.facts · about-userread · 18ms · 3 keys{
timezone: 'Europe/Amsterdam',
preferences: { tone: 'casual' },
upcoming: [ { type: 'demo', day: 'Thursday' } ],
}faithfulness
0.93
+0.02 vs yesterday
relevance
0.87
−0.04 vs yesterday
safety
0.99
0.00 vs yesterday
Rolling average · 24h
By model · today
gpt-4oclaude-sonnetgeminigpt-4o-miniFaithfulness · Relevance · Safety
Cost & terminal
Tokens are spans. So are dollars.
Install withCostTracking() and model spend rolls up the same event graph as everything else: by prompt, by model, by flow, by session. View it in devtools, in the crux cost CLI, or live in a terminal dashboard.
Week
$314.62
−18% vs last
Top prompts
replytriage-swarmdraft-editsummarizeclassify┏━ crux dev --tui · live ━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ traces/min 142 p95 1.4s err 0.3% ┃ tokens/min 89.2k $/h 4.18 ┣━ recent ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ ● reply 1.21s 892t gpt-4o ┃ ● triage-swarm 3.40s 4.1k gpt-4o, claude ┃ ● draft-edit 0.74s 1.3k gpt-4o ┃ ✕ reply 2.10s 1.6k security.injection ┃ ● summarize 0.98s 780 gpt-4o-mini ┣━ judges (last 100) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫ ┃ faithfulness ████████████████░░░ 0.93 ┃ relevance ██████████████░░░░░ 0.87 ┃ safety ███████████████████░ 0.99 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ j/k navigate · / filter · enter inspect · i index · q quit
In production
OpenTelemetry, on day one.
One adapter, every stack. Crux ships an OTel exporter that turns its event graph into spans your existing observability stack already understands.
import { config } from '@use-crux/core'import { withTelemetry } from '@use-crux/otel'export default config({plugins: [withTelemetry({serviceName: 'reply-api',// Lightweight exporter for Lambda /// Convex / Workers. Omit for the// standard Node OTel SDK path.exporter: {url: process.env.OTEL_ENDPOINT,headers: {'X-Api-Key': process.env.OTEL_KEY,},},}),],})// Every generate(), tool, memory op,// flow step, judge — now an OTel span// with gen_ai.* semantic conventions.
Exports to
Datadog
OTLP/HTTP
Honeycomb
OTLP/HTTP
Grafana Tempo
OTLP/gRPC
New Relic
OTLP/HTTP
Axiom
OTLP/HTTP
Jaeger
OTLP/gRPC
Two paths
Standard OTel for long-lived Node servers. Spans flow through your global TracerProvider.
Lightweight exporter for Lambda, Convex, Cloudflare Workers. Fire-and-forget OTLP/HTTP, no SDK to bundle.
Spans follow the OpenTelemetry GenAI semantic conventions (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens), so your existing dashboards and alerting rules just work.
Plugin system
Write your own sink.
Devtools and @use-crux/otel are themselves CruxPlugins. Build your own: tap any instrumentation hook, wrap every generate() with middleware, or stream judge scores to Slack.
import type { CruxPlugin } from '@use-crux/core'import { subscribeObservability } from '@use-crux/core/observability'export function slackAlerts(opts: {channel: string,}): CruxPlugin {return {name: 'slack-alerts',install(runtime) {const unsubscribe = subscribeObservability((record) => {if (record.type !== 'span:end') returnif (record.status === 'error') {fetch('https://hooks.slack.com/...', {method: 'POST',body: JSON.stringify({channel: opts.channel,text: `judge failed in run ${record.runId}`,}),}).catch(() => {}) // fire-and-forget}})return {dispose: unsubscribe,}},}}
subscribeObservabilitySubscribe once to canonical run, span, event, artifact, and edge records from the graph spine.middlewareWrap every generate()/stream() call: logging, timing, retry, multi-tenant scoping.resolveHookObserve prompt .resolve() calls: system composition, dropped contexts.evalReporterStream eval progress to Slack, a notebook, your own dashboard.Hooks fan out: multiple plugins can subscribe to the same event. Middleware layers. The later plugin wraps the earlier one. Devtools and @use-crux/otel are just two plugins reading the same stream.
Stop debugging from logs.
Trace your harness end to end. Locally, and in production.