Crux · Observability

One canonical
event graph.

Crux instruments the whole harness around your LLM call. Memory ops, retrieval, guardrails, routing, evals. One structured event stream, rendered locally in devtools and exported as OpenTelemetry in production.

crux dev
recording
Traces
Memory
Evals
Security
Catalog
Recent
reply00:14
1.2s892t
triage-swarm00:12
3.4s4.1k
draft-edit00:09
740ms1.3k
classify00:08
180ms210
reply00:04
2.1s1.6k
guardrail.pii
summarize00:02
980ms780
reply00:00
1.4s910
prompt:replygenerate · gpt-4o
OK
resolve
12ms
memory.read · recent
24ms
memory.read · facts
18ms
retriever · embed
32ms
retriever · search
21ms
guardrail · pii
11ms
router · cascade
6ms
generate · gpt-4o
714ms
constrain · zod
9ms
memory.write
12ms
observe.emit
5ms
Resolved system

<context id="brand" priority=30> Write in a casual tone.
<context id="memory" priority=20> Recent: 4 messages.
<context id="docs" priority=10> Docs available.
Answer using memory and retrieved docs.

Inspector
Output (typed)
{ answer: 'You asked me to remind you about the demo.' }
Judge
faithfulness0.94
relevance0.88
safety1.00
OTel → Datadog
trace_id 4a7f…be0c
span_id c01e…
@use-crux/devtools · localhost:4400traces · memory · evals · security · index
Exports to →DATADOGHONEYCOMBGRAFANANEW RELICAXIOMTEMPO

Why

Logs are a flashlight in a haystack.

When an LLM feature flakes in production, console.log of the request body isn't enough. You need the resolved system text, the retrieval that ran, the guardrail that fired, the model that answered, the schema that did or didn't parse, end to end, in one place.

Every blockMemory, retrieval, guardrails, routing, generate, eval. All emit the same shape.
Zero overhead when disabledPlugins read the event stream. No subscribers, no cost.
Same in dev and prodThe local devtools and your OTel exporter are reading the same graph.

Trace timeline

Every span. Every retry.

Crux groups spans into before / call / after lanes so a flaky retrieval doesn’t hide behind the model latency. Click any span for resolved inputs, outputs, and parents.

prompt:replytrace_id 4a7f…be0c · 1.21s total · 892 tokens · gpt-4o
OK
BEFORE
memory.read · recent
24ms
memory.read · facts
18ms
retriever · embed
32ms
retriever · search
21ms
guardrail · pii
11ms
CALL
generate · gpt-4o
714ms
AFTER
constrain · zod
9ms
memory.write
12ms
evaluate · judge
6ms
0ms250ms500ms750ms1000ms1210ms

Inspect the harness

What was read. What was written. How it scored.

Each block surfaces its own inspector panel. Memory shows the keys that were read and the writes that landed. Eval shows the judges that ran across the matrix.

crux dev · memory
Reads
Writes
Compaction
Snapshot
memory.recentMessagesread · 24ms · 4 items
userCan you remind me about the demo?2m
assistantSure — it’s scheduled Thursday at 10am.2m
userMove it to 11am please.1m
assistantDone. Demo now Thursday 11am.1m
memory.facts · about-userread · 18ms · 3 keys
{
  timezone: 'Europe/Amsterdam',
  preferences: { tone: 'casual' },
  upcoming: [ { type: 'demo', day: 'Thursday' } ],
}
pending writes1 fact + 2 messages
@use-crux/devtools · store: pg · session 8f3atraces · memory · evals · security · index
crux dev · evals
Live
Suites
Runs
Diff

faithfulness

0.93

+0.02 vs yesterday

relevance

0.87

−0.04 vs yesterday

safety

0.99

0.00 vs yesterday

Rolling average · 24h

By model · today

gpt-4o
claude-sonnet
gemini
gpt-4o-mini

Faithfulness · Relevance · Safety

@use-crux/devtools · prompt:reply · last 24htraces · memory · evals · security · index

Cost & terminal

Tokens are spans. So are dollars.

Install withCostTracking() and model spend rolls up the same event graph as everything else: by prompt, by model, by flow, by session. View it in devtools, in the crux cost CLI, or live in a terminal dashboard.

crux dev · cost
By prompt
By model
By flow
By session
Budgets

Week

$314.62

−18% vs last

Top prompts

reply
8.2M$142.10
triage-swarm
4.1M$84.40
draft-edit
3.0M$52.18
summarize
1.4M$24.94
classify
420k$11.00
@use-crux/devtools · last 7 daystraces · memory · evals · security · index
~/app · crux dev --tui
┏━ crux dev --tui · live ━━━━━━━━━━━━━━━━━━━━━━━━┓
┃
traces/min   142     p95  1.4s    err  0.3%
tokens/min   89.2k   $/h  4.18
┣━ recent ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
 reply         1.21s  892t   gpt-4o
 triage-swarm  3.40s  4.1k   gpt-4o, claude
 draft-edit    0.74s  1.3k   gpt-4o
 reply         2.10s  1.6k   security.injection
 summarize     0.98s  780    gpt-4o-mini
┣━ judges (last 100) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ faithfulness ████████████████░░░ 0.93
┃ relevance    ██████████████░░░░░ 0.87
┃ safety       ███████████████████░ 0.99
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
j/k navigate · / filter · enter inspect · i index · q quit

In production

OpenTelemetry, on day one.

One adapter, every stack. Crux ships an OTel exporter that turns its event graph into spans your existing observability stack already understands.

crux.config.ts
import { config } from '@use-crux/core'
import { withTelemetry } from '@use-crux/otel'
 
export default config({
plugins: [
withTelemetry({
serviceName: 'reply-api',
// Lightweight exporter for Lambda /
// Convex / Workers. Omit for the
// standard Node OTel SDK path.
exporter: {
url: process.env.OTEL_ENDPOINT,
headers: {
'X-Api-Key': process.env.OTEL_KEY,
},
},
}),
],
})
 
// Every generate(), tool, memory op,
// flow step, judge — now an OTel span
// with gen_ai.* semantic conventions.

Exports to

Datadog

OTLP/HTTP

Honeycomb

OTLP/HTTP

Grafana Tempo

OTLP/gRPC

New Relic

OTLP/HTTP

Axiom

OTLP/HTTP

Jaeger

OTLP/gRPC

Two paths

Standard OTel for long-lived Node servers. Spans flow through your global TracerProvider.
Lightweight exporter for Lambda, Convex, Cloudflare Workers. Fire-and-forget OTLP/HTTP, no SDK to bundle.

Spans follow the OpenTelemetry GenAI semantic conventions (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens), so your existing dashboards and alerting rules just work.

Plugin system

Write your own sink.

Devtools and @use-crux/otel are themselves CruxPlugins. Build your own: tap any instrumentation hook, wrap every generate() with middleware, or stream judge scores to Slack.

plugins/slack-alerts.ts
import type { CruxPlugin } from '@use-crux/core'
import { subscribeObservability } from '@use-crux/core/observability'
 
export function slackAlerts(opts: {
channel: string,
}): CruxPlugin {
return {
name: 'slack-alerts',
install(runtime) {
const unsubscribe = subscribeObservability((record) => {
if (record.type !== 'span:end') return
if (record.status === 'error') {
fetch('https://hooks.slack.com/...', {
method: 'POST',
body: JSON.stringify({
channel: opts.channel,
text: `judge failed in run ${record.runId}`,
}),
}).catch(() => {}) // fire-and-forget
}
})
return {
dispose: unsubscribe,
}
},
}
}
subscribeObservabilitySubscribe once to canonical run, span, event, artifact, and edge records from the graph spine.
middlewareWrap every generate()/stream() call: logging, timing, retry, multi-tenant scoping.
resolveHookObserve prompt .resolve() calls: system composition, dropped contexts.
evalReporterStream eval progress to Slack, a notebook, your own dashboard.

Hooks fan out: multiple plugins can subscribe to the same event. Middleware layers. The later plugin wraps the earlier one. Devtools and @use-crux/otel are just two plugins reading the same stream.

Stop debugging from logs.

Trace your harness end to end. Locally, and in production.

$npm install @use-crux/core @use-crux/otel