Observability

Canonical graph record contract shared by Crux runtimes, devtools, TUI, and the Go backend.

import {
  CruxGraphRecordBatchSchema,
  createHttpObservabilityTransport,
  createCruxRunId,
  createCruxSpanId,
  normalizeObservedError,
  observe,
} from '@crux/core/observability'

@crux/core/observability is the canonical graph contract for Crux execution traces. It defines the record shapes emitted by the TypeScript runtime and consumed by the local Go devtools backend.

The contract is lifecycle based:

run:start
span:start
span:event
artifact
edge
span:end
run:end

The backend turns these write records into read models such as run lists, RunDetail, resource activity, artifact previews, filters, and search facets. Web devtools and the TUI should read those backend models instead of inferring graph semantics themselves.

RunDetail is the default one-run inspection model. It includes the rooted presentation tree, flattened rows, display labels, attached details, events, artifacts, relation overlays, diagnostics, facets, aggregate metric rollups, status rollups, inspection sections, source metadata, request/context composition, and a span index. Detail ownership is resolved centrally by the Go service: explains edges and presentation.ownerSpanId attach prompt/context/memory/routing metadata to the generation, agent, tool, step, or composition they explain, while chronology is only a fallback. Routing decisions fold onto their selected concrete generation even when the canonical graph wraps the generation under routing.router / routing.cascade; quiet constraint, guardrail, citation, scoring, and security warning spans fold as safety/detail evidence, while execution-changing governance remains visible. If a collector only receives completion-side records for a span, RunDetail keeps them as details instead of inventing an anonymous operation row.

Generation nodes and aggregating nodes can include request?: CruxRunDetailRequest. Exact generation requests are built from the consumed request-shaped messages artifact, linked context contributions, linked prompt budget, and request tool names. The backend also collects context contributions and prompt budgets produced under the nearest enclosing request scope before the generation starts, which covers framework agents whose Crux prompt resolves under agent.run before the provider stream begins. Nested framework agent steps that only emit output-shaped message artifacts inherit the nearest enclosing generation request and expose it with mode: "inherited" plus representative.strategy: "nearest-ancestor-request". When Convex Agent emits both call-args and thread-context request artifacts, the backend prefers thread-context and preserves prior-turn fields such as allMessages, recent, inputMessages, inputPrompt, existingResponses, and search on request.messages; inherited agent steps also include earlier sibling generation outputs as previousStepMessages. Aggregating run, stream, agent, composition, and flow nodes use the backend-selected final descendant generation as the representative effective request and include turns[] for the descendant generation breakdown. request.basePrompt.sourceId uses the concrete prompt id when known and otherwise reports messages.system / messages.prompt; base prompt segments come from systemBlocks, which preserve explicit segments and infer direct string-template interpolation from unambiguous primitive input values. request.modelSummary groups concrete generation models for exact, inherited, and aggregate requests, marks mixed-model aggregates, and turns[] carries per-generation model, provider, status, and promptId; flattened rows[] carries the row's effective model and provider as well. Convex Agent aggregate and per-step generation spans emit their configured languageModel so framework turns have model provenance, while nested tool-call or flow generations retain their own model identity. Clients should render these fields directly instead of walking descendant artifacts or deduping context contributions.

RunDetail also reconciles missing terminal lifecycle records from reliable canonical signals. Expired operation.deadline events mark the timed operation and its still-open ancestors as incomplete observability in the presentation tree, while future deadlines keep active long calls from being marked stale too early. Execution-changing governance rolls ancestors up to blocked, and intentional waits roll ancestors up to suspended. The canonical records remain unchanged. This is a telemetry diagnostic, not an application error.

Contextual retrieval, memory, and embedding spans fold into attached details when they are request-input evidence for a generation. Operational retrieval inside a tool, flow, composition, or agent boundary remains visible even when that branch lives under an agent generation stream; retrieval query/embed internals can still fold into the retrieval pipeline node.

Error evidence

Thrown errors are represented with a shared normalized shape. observe.span() and observe.openSpan().error() attach:

A compact error summary to the terminal span:end or run:end.
A span:event named exception with OpenTelemetry-style attributes.
An error.stack artifact when a stack trace exists.
An error.raw artifact containing a bounded, redacted, JSON-safe representation of the thrown value.

The Go backend promotes span.error, error.stack, and error.raw into RunDetail.inspection.errors. Web devtools and the TUI render that section for any primitive, including tools, generation, retrieval stages, flow steps, eval cases, routing attempts, and custom spans.

Crux only uses exception evidence for actual thrown failures. Non-throwing outcomes stay in their domain-specific records: approval denial is tool.approval plus model-facing tool output, guardrail blocks are guardrail.report / guardrail.blocked, constraint retries are constraint.retry, retrieval zero hits are result counts and retrieval.hits, invalid citations are citation.report, cascade tier rejection is routing metadata, flow suspension/cancellation is flow status, and stream length or content_filter is a finish reason.

Runtime bridge and eval-runner failures can happen outside a normal span. Those responses carry the same normalized details under command.error.details, so UI and TUI surfaces can show name, message, stack, phase, kind, and raw details even when no span exists.

Exports

Export	Purpose
`CruxGraphRecordSchema`	Zod schema for one graph record
`CruxGraphRecordBatchSchema`	Zod schema for a batch of records
`CRUX_OBSERVABILITY_SCHEMA_VERSION`	Current graph schema version
`createCruxRunId()`	Create a branded run id
`createCruxSpanId()`	Create a branded span id
`createCruxArtifactId()`	Create a branded artifact id
`normalizeObservedError()`	Normalize an unknown thrown value into safe inspectable error evidence
`observedErrorSummary()`	Produce the compact terminal span/run error summary
`toSafeJsonValue()`	Convert unknown values into bounded, redacted JSON-safe previews

Built-in primitives use canonical names such as generation.call, generation.stream, prompt.resolve, prompt.budget, context.predicate, context.resolve, memory.read, memory.write, retrieval.pipeline, retrieval.query, retrieval.stage, indexing.pipeline, ingest.parse, corpus.sync, embedding.call, cache.lookup, cost.record, compaction.run, eval.run, eval.case, scoring.judge, citation.check, feedback.record, workspace.operation, plan.operation, task.operation, skill.load, security.warning, routing.router, routing.cascade, fallback.attempt, constraint.check, constraint.retry, guardrail.run, agent.run, flow.run, flow.step, flow.suspension, composition.parallel, composition.pipeline, composition.consensus, composition.swarm, handoff.prepare, delegate.invoke, and tool.call.

Custom edge types and artifact kinds must use the custom.* namespace. Unknown custom values are preserved and can be displayed generically, but specialized indexing and layouts are reserved for canonical values.

Common detail artifacts use stable kinds so RunDetail.inspection can route them without UI-specific parsing:

Artifact kind	Primary payload
`context.contribution`	Context inclusion state, source, primitive kind, priority, tokens, cache status, exclusion reason, segmented static/dynamic text, and contributed tool names in `injectedTools`.
`prompt.budget`	Prompt token budget totals and dropped context contribution previews.
`retrieval.hits`	Query, limit, returned hits, hit ranks, source/chunk ids, scores, and pipeline stage summaries.
`citation.report` / `score.report`	Grounding markers, optional output anchors (`start`, `end`, `outputQuote`), score verdicts, judge rationales, and pass/fail detail.
`composition.report`	Parallel branches, pipeline stages, consensus votes, or swarm handoff path.
`handoff.payload` / `delegate.report`	Transfer contract data, hop metadata, input/output sizes, and delegate result preview.
`constraint.report` / `guardrail.report`	Constraint attempts/retries and guardrail action reports with before/after previews when available.
`routing.report` / `cache.report`	Router/cascade/fallback choices, full cascade tier ladders including skipped/not-reached tiers, evaluator notes/confidence/budgets, semantic cache hits/writes, and saved work.
`compaction.report`	Before/after token counts and summary previews.
`memory.snapshot` / `memory.recall` / `memory.diff`	Memory state snapshots, recalled block previews (`blockKind`, `key`, `preview`, `score`), and before/after write summaries with added/removed block summaries.
`embedding.report` / `indexing.report` / `ingest.report` / `corpus.report`	Embedding shape/cache metrics, indexing totals/stage counts, loader status, and corpus source ledger summaries.
`security.report`	Prompt-injection warning severity, pattern, field/location, action, and bounded preview.

Runtime emitters

Use observe.run() for user-facing execution roots and observe.span() for inspectable work inside a run. Standalone spans automatically create an implicit run, so helpers like pipeline, consensus, or swarm can be called directly and still produce a complete trace.

Crux's built-in orchestration helpers emit these records automatically. parallel() creates sibling agent.run spans, pipeline() records flow.step plus nested agent.run spans, flow() / withFlow() records flow.run, step children, and flow.suspension timeline markers for intentional waits, consensus() nests its voter fanout, and swarm / delegate handoffs create handoff.prepare spans, canonical input/output artifacts, handoff.payload artifacts, and relation edges.

Generation and streaming orchestration record operation deadlines when a timeout is configured. @crux/ai passes timeoutMs into core orchestration, which records timeoutMs / deadlineAt and emits operation.deadline before the provider call begins.

Prompt and safety helpers emit the same graph contract automatically:

prompt.resolve() opens prompt.resolve, records conditional context inclusion/exclusion as context.predicate, and records resolved context text as context.resolve spans with context.contribution artifacts. Contributions that provide tools include injectedTools, including dropped text contributions whose tools still participate in the request. Direct injectables, retrievers/grounding, memory, and blackboards emit tool-only context.contribution previews when they contribute tools without resolved text. Segmented system content preserves segments: { text, dynamic, source? }[], staticTokens, and dynamicTokens on context previews, inspect parts, and budget-dropped previews. Generation spans that consume the resolved prompt link back to included context artifacts and prompt-budget artifacts, and their messages artifact includes request toolNames; the Go read model projects those records into RunDetailNode.request. Token-budget decisions emit prompt.budget artifacts with dropped contribution previews.
Memory blocks and blackboards emit memory.read / memory.write spans, attach memory.snapshot artifacts when a read or write has inspectable state, attach memory.recall artifacts for non-empty read result previews, attach memory.diff artifacts for before/after write summaries, and connect those artifacts with semantic memory edges. Empty reads still emit the span but omit memory.recall, so UIs should not render a recalled-block card for zero-result reads. Raw namespaces are represented by namespaceHash; blackboard snapshots and diffs carry memoryType: "blackboard".
Adapter-managed tools, including live @crux/ai tool calls, emit tool.call spans, consume tool.args artifacts, produce raw and model-facing tool.result artifacts, and record thrown execute() failures as rich error evidence on the same tool span while preserving the model-facing error output. Model-emitted tool intents attach to the generation as tool.request artifacts and are linked to the eventual tool execution by tool call id. Convex Agent streams expose an outer generation.stream plus child generation.call spans for each AI SDK step, so tool-call turns and later text turns are visible as consecutive streamed generations. Convex Agent fallback tool-call parts follow the same contract, so stop-condition tools remain inspectable even when no handler executes. Approval request, approval, denial, and token mismatch paths emit tool.approval spans.
Retrieval calls emit retrieval.query spans with bounded retrieval.hits artifacts and retrieval.returned edges. Retrieval pipelines open a parent retrieval.pipeline span and record fanout/query/hit stages as child retrieval.stage spans with inspectable stage output artifacts.
Indexing and corpus helpers emit indexing.pipeline, ingest.parse, and corpus.sync spans. Document transforms, chunkers, chunk transforms, dry runs, cache outcomes, loader results, and source-ledger summaries are visible through child spans plus indexing.report, ingest.report, and corpus.report artifacts.
Embeddings emit embedding.call spans with provider name, kind, dimensions, batch shape, usage, cost, governance metrics, and bounded embedding.report artifacts. Configured embedding caches emit nested cache.lookup spans with hit/miss/write counts and per-entry hit/miss events.
Semantic cache middleware emits cache.lookup spans for lookup, write, and skip decisions. Hit, miss, write, and skip events carry prompt id, scope hash, version, query hash, score, result kind, and timing metadata; hit/write decisions also attach cache.report artifacts.
Cost tracking emits cost.record spans with call attribution, token/cost totals, and cost.warn / cost.limit events. Token budget checks emit prompt.budget spans with source breakdowns and pressure levels. Compaction helpers emit compaction.run spans and compaction.report artifacts with before/after token counts, compression ratios, focus, model labels, and bounded summary previews.
Quality runs emit eval.run spans and eval.case children for each case/variant. LLM judges emit scoring.judge spans with bounded score.report artifacts. Citation validation emits citation.check spans with bounded citation.report artifacts; structured citations may carry output-text anchors (start, end, and outputQuote) so clients can place inline superscripts without reparsing the generated answer. Feedback inbox writes emit feedback.record spans with bounded feedback artifacts.
Workspaces emit workspace.operation spans for list/read/write/edit/delete operations, with namespace hashes and bounded output artifacts. Plans and tasks emit plan.operation / task.operation spans for mutations, including task create/update/remove/discard paths. File and registry skills emit skill.load spans, and configured prompt-injection diagnostics emit security.warning spans plus security.report artifacts.
Router, cascade, and fallback wrappers emit routing.router, routing.cascade, and fallback.attempt spans. Router spans record classification and selected route/model; cascade spans record attempted tiers, rejection/acceptance, budget exits, provider errors, and full ordered configured tier metadata for skipped/not-reached tiers; fallback attempt spans record model attempts, bounded error classifications, success metadata, and fallback.attempt edges between failed and next attempts. Each path attaches a routing.report artifact for Run Detail routing cards.
The Safety session's constraint phase opens a grouped constraint.check span, records each individual check as a child span, attaches constraint.report artifacts, and emits constraint.retry spans plus constraint.retry edges when feedback triggers regeneration.
The Safety session's guardrail phases open grouped and per-guard guardrail.run spans, attach guardrail.report artifacts for pass, warn, redact, transform, hold, and block actions, and emit guardrail.blocked edges for blocks.

import { observe } from '@crux/core/observability'

await observe.run({ name: 'support reply', rootPrimitive: 'agent.run' }, async () => {
  await observe.span({ name: 'retrieve docs', family: 'retrieval', primitive: 'retrieval.query' }, async () => {
    observe.event({ name: 'query.built', attributes: { terms: ['refund', 'policy'] } })
  })
})

The runtime uses async context propagation on Node.js and preserves parent span stacks across normal async work. For async work that crosses a boundary where context is not preserved, pass an explicit captured context:

const context = observe.captureContext()

queueMicrotask(() => {
  void observe.withContext(context, async () => {
    await observe.span({ name: 'delegate', family: 'delegate', primitive: 'delegate.invoke' }, async () => {
      // delegated work
    })
  })
})

Delivery

The default runtime path is non-blocking: transport failures are captured as diagnostics and never throw into user code. In serverless runtimes, await a bounded flush before returning so queued records are not killed with the request:

await observe.flush({ timeoutMs: 5000 })

Observability delivery is queued into batches. The first delivery starts immediately for live devtools, and later records coalesce per microtask before being sent as bounded concurrent batches. This avoids head-of-line blocking when a collector, tunnel, or Convex network hop is slow. HTTP delivery normalizes unknown preview values into JSON-safe shapes and isolates rejected records inside a failed batch, so one malformed or oversized detail artifact does not prevent terminal lifecycle records from reaching the backend. The Go observability backend reconciles out-of-order lifecycle records by stable ids and timestamps. The runtime never blocks user code on normal emits, but flush() and shutdown() wait for pending deliveries so serverless and Convex actions can preserve span starts, ends, artifacts, and events before the worker exits.

Convex actions can use the helpers from @crux/convex:

import { withObservabilityFlush } from '@crux/convex/observability'

export const run = internalAction({
  args: {},
  handler: withObservabilityFlush(async (ctx, args) => {
    // Crux work here
  }),
})

@crux/convex/server actions flush automatically by default. Pass observabilityFlushTimeoutMs: false only when an outer serverless boundary already flushes.

Use createHttpObservabilityTransport() to send canonical batches to the local Crux backend endpoint:

import { config } from '@crux/core'

config({
  observability: {
    serverUrl: 'http://localhost:4400',
    delivery: { maxPendingDeliveries: 1000 },
  },
})

The observability domain is explicit export configuration. config() with no observability block does not install telemetry, cloud upload, or raw-content capture. Use devtools only for local UI/control/tunnel/bridge behavior.

The HTTP transport posts { records } to /api/observability/records. The Go backend owns validation, persistence, read-model building, filtering, search, and subscriptions. Devtools keeps terminal run-detail pages refreshing for a short grace window, which catches late Convex/serverless flushes that arrive after the terminal run update. Convex Agent container streams fold into details in the Go read model, while step-level streamed generation turns, tools, handoffs, and delegated flows render as chronological agent children. Promoted tool executions preserve source.canonicalParentSpanId and sort after their request generation by relation, even when Convex action timestamps arrive slightly out of order.

Live token visualization uses the same observability channel without turning every chunk into a whole-run refetch. span:event records named token.delta are persisted as canonical events and also broadcast as append-only token.delta notifications keyed by runId, spanId, and eventId.

Observability

Error evidence

Exports

Runtime emitters

Delivery

On this page