Crux · Observability

One canonical
event graph.

Crux instruments the whole harness around your LLM call. Memory ops, retrieval, guardrails, routing, evals. One structured event stream, rendered locally in devtools and exported as OpenTelemetry in production.

Open devtools demo Read the docs

crux dev

recording

Traces

Memory

Evals

Security

Catalog

Recent

reply00:14

1.2s892t

triage-swarm00:12

3.4s4.1k

draft-edit00:09

740ms1.3k

classify00:08

180ms210

reply00:04

2.1s1.6k

guardrail.pii ✕

summarize00:02

980ms780

reply00:00

1.4s910

prompt:replygenerate · gpt-4o

resolve

12ms

memory.read · recent

24ms

memory.read · facts

18ms

retriever · embed

32ms

retriever · search

21ms

guardrail · pii

11ms

router · cascade

6ms

generate · gpt-4o

714ms

constrain · zod

9ms

memory.write

12ms

observe.emit

5ms

Resolved system

<context id="brand" priority=30> Write in a casual tone.
<context id="memory" priority=20> Recent: 4 messages.
<context id="docs" priority=10> Docs available.
Answer using memory and retrieved docs.

Inspector

Output (typed)

{
  answer: 'You asked
    me to remind you
    about the demo.'
}

Judge

faithfulness0.94

relevance0.88

safety1.00

OTel → Datadog

trace_id 4a7f…be0c
span_id c01e…

@use-crux/devtools · localhost:4400traces · memory · evals · security · index

Exports to →DATADOGHONEYCOMBGRAFANANEW RELICAXIOMTEMPO

Why

Logs are a flashlight in a haystack.

When an LLM feature flakes in production, console.log of the request body isn't enough. You need the resolved system text, the retrieval that ran, the guardrail that fired, the model that answered, the schema that did or didn't parse, end to end, in one place.

Every blockMemory, retrieval, guardrails, routing, generate, eval. All emit the same shape.

Zero overhead when disabledPlugins read the event stream. No subscribers, no cost.

Same in dev and prodThe local devtools and your OTel exporter are reading the same graph.

Trace timeline

Every span. Every retry.

Crux groups spans into before / call / after lanes so a flaky retrieval doesn’t hide behind the model latency. Click any span for resolved inputs, outputs, and parents.

prompt:replytrace_id 4a7f…be0c · 1.21s total · 892 tokens · gpt-4o

BEFORE

memory.read · recent

24ms

memory.read · facts

18ms

retriever · embed

32ms

retriever · search

21ms

guardrail · pii

11ms

CALL

generate · gpt-4o

714ms

AFTER

constrain · zod

9ms

memory.write

12ms

evaluate · judge

6ms

0ms250ms500ms750ms1000ms1210ms

Inspect the harness

What was read. What was written. How it scored.

Each block surfaces its own inspector panel. Memory shows the keys that were read and the writes that landed. Eval shows the judges that ran across the matrix.

crux dev · memory

Reads

Writes

Compaction

Snapshot

memory.recentMessagesread · 24ms · 4 items

userCan you remind me about the demo?2m

assistantSure — it’s scheduled Thursday at 10am.2m

userMove it to 11am please.1m

assistantDone. Demo now Thursday 11am.1m

memory.facts · about-userread · 18ms · 3 keys

{
  timezone: 'Europe/Amsterdam',
  preferences: { tone: 'casual' },
  upcoming: [ { type: 'demo', day: 'Thursday' } ],
}

pending writes1 fact + 2 messages

@use-crux/devtools · store: pg · session 8f3atraces · memory · evals · security · index

crux dev · evals

Live

Suites

Runs

Diff

faithfulness

0.93

+0.02 vs yesterday

relevance

0.87

−0.04 vs yesterday

safety

0.99

0.00 vs yesterday

Rolling average · 24h

By model · today

gpt-4o

claude-sonnet

gemini

gpt-4o-mini

Faithfulness · Relevance · Safety

@use-crux/devtools · prompt:reply · last 24htraces · memory · evals · security · index

Cost & terminal

Tokens are spans. So are dollars.

Install withCostTracking() and model spend rolls up the same event graph as everything else: by prompt, by model, by flow, by session. View it in devtools, in the crux cost CLI, or live in a terminal dashboard.

crux dev · cost

By prompt

By model

By flow

By session

Budgets

Week

$314.62

−18% vs last

Top prompts

reply

8.2M$142.10

triage-swarm

4.1M$84.40

draft-edit

3.0M$52.18

summarize

1.4M$24.94

classify

420k$11.00

@use-crux/devtools · last 7 daystraces · memory · evals · security · index

~/app · crux dev --tui

┏━ crux dev --tui · live ━━━━━━━━━━━━━━━━━━━━━━━━┓
┃
┃ traces/min   142     p95  1.4s    err  0.3%
┃ tokens/min   89.2k   $/h  4.18
┣━ recent ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ ● reply         1.21s  892t   gpt-4o
┃ ● triage-swarm  3.40s  4.1k   gpt-4o, claude
┃ ● draft-edit    0.74s  1.3k   gpt-4o
┃ ✕ reply         2.10s  1.6k   security.injection
┃ ● summarize     0.98s  780    gpt-4o-mini
┣━ judges (last 100) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ faithfulness ████████████████░░░ 0.93
┃ relevance    ██████████████░░░░░ 0.87
┃ safety       ███████████████████░ 0.99
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
j/k navigate · / filter · enter inspect · i index · q quit

In production

OpenTelemetry, on day one.

One adapter, every stack. Crux ships an OTel exporter that turns its event graph into spans your existing observability stack already understands.

crux.config.ts

import { config } from '@use-crux/core'
import { withTelemetry } from '@use-crux/otel'
 
export default config({
  plugins: [
    withTelemetry({
      serviceName: 'reply-api',
      // Lightweight exporter for Lambda /
      // Convex / Workers. Omit for the
      // standard Node OTel SDK path.
      exporter: {
        url: process.env.OTEL_ENDPOINT,
        headers: {
          'X-Api-Key': process.env.OTEL_KEY,
        },
      },
    }),
  ],
})
 
// Every generate(), tool, memory op,
// flow step, judge — now an OTel span
// with gen_ai.* semantic conventions.

Exports to

Datadog

OTLP/HTTP

Honeycomb

OTLP/HTTP

Grafana Tempo

OTLP/gRPC

New Relic

OTLP/HTTP

Axiom

OTLP/HTTP

Jaeger

OTLP/gRPC

Two paths

Standard OTel for long-lived Node servers. Spans flow through your global TracerProvider.
Lightweight exporter for Lambda, Convex, Cloudflare Workers. Fire-and-forget OTLP/HTTP, no SDK to bundle.

Spans follow the OpenTelemetry GenAI semantic conventions (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens), so your existing dashboards and alerting rules just work.

Plugin system

Write your own sink.

Devtools and @use-crux/otel are themselves CruxPlugins. Build your own: tap any instrumentation hook, wrap every generate() with middleware, or stream judge scores to Slack.

plugins/slack-alerts.ts

import type { CruxPlugin } from '@use-crux/core'
import { subscribeObservability } from '@use-crux/core/observability'
 
export function slackAlerts(opts: {
  channel: string,
}): CruxPlugin {
  return {
    name: 'slack-alerts',
    install(runtime) {
      const unsubscribe = subscribeObservability((record) => {
        if (record.type !== 'span:end') return
        if (record.status === 'error') {
          fetch('https://hooks.slack.com/...', {
            method: 'POST',
            body: JSON.stringify({
              channel: opts.channel,
              text: `judge failed in run ${record.runId}`,
            }),
          }).catch(() => {}) // fire-and-forget
        }
      })
      return {
        dispose: unsubscribe,
      }
    },
  }
}

subscribeObservabilitySubscribe once to canonical run, span, event, artifact, and edge records from the graph spine.

middlewareWrap every generate()/stream() call: logging, timing, retry, multi-tenant scoping.

resolveHookObserve prompt .resolve() calls: system composition, dropped contexts.

evalReporterStream eval progress to Slack, a notebook, your own dashboard.

Hooks fan out: multiple plugins can subscribe to the same event. Middleware layers. The later plugin wraps the earlier one. Devtools and @use-crux/otel are just two plugins reading the same stream.

Stop debugging from logs.

Trace your harness end to end. Locally, and in production.

Open devtools demo Back to overview

$npm install @use-crux/core @use-crux/otel

One canonicalevent graph.

Logs are a flashlight in a haystack.

Every span. Every retry.

What was read. What was written. How it scored.

Tokens are spans. So are dollars.

OpenTelemetry, on day one.

Write your own sink.

Stop debugging from logs.

One canonical
event graph.