Observability
Canonical graph record contract shared by Crux runtimes, devtools, TUI, and the Go backend.
import {
CruxGraphRecordBatchSchema,
createHttpObservabilityTransport,
createCruxRunId,
createCruxSpanId,
normalizeObservedError,
observe,
} from '@crux/core/observability'@crux/core/observability is the canonical graph contract for Crux execution traces. It defines the record shapes emitted by the TypeScript runtime and consumed by the local Go devtools backend.
The contract is lifecycle based:
run:startspan:startspan:eventartifactedgespan:endrun:end
The backend turns these write records into read models such as run lists, RunDetail, resource activity, artifact previews, filters, and search facets. Web devtools and the TUI should read those backend models instead of inferring graph semantics themselves.
RunDetail is the default one-run inspection model. It includes the rooted presentation tree, flattened rows, display labels, attached details, events, artifacts, relation overlays, diagnostics, facets, aggregate metric rollups, status rollups, inspection sections, source metadata, request/context composition, and a span index. Detail ownership is resolved centrally by the Go service: explains edges and presentation.ownerSpanId attach prompt/context/memory/routing metadata to the generation, agent, tool, step, or composition they explain, while chronology is only a fallback. Routing decisions fold onto their selected concrete generation even when the canonical graph wraps the generation under routing.router / routing.cascade; quiet constraint, guardrail, citation, scoring, and security warning spans fold as safety/detail evidence, while execution-changing governance remains visible. If a collector only receives completion-side records for a span, RunDetail keeps them as details instead of inventing an anonymous operation row.
Generation nodes and aggregating nodes can include request?: CruxRunDetailRequest. Exact generation requests are built from the consumed request-shaped messages artifact, linked context contributions, linked prompt budget, and request tool names. The backend also collects context contributions and prompt budgets produced under the nearest enclosing request scope before the generation starts, which covers framework agents whose Crux prompt resolves under agent.run before the provider stream begins. Nested framework agent steps that only emit output-shaped message artifacts inherit the nearest enclosing generation request and expose it with mode: "inherited" plus representative.strategy: "nearest-ancestor-request". When Convex Agent emits both call-args and thread-context request artifacts, the backend prefers thread-context and preserves prior-turn fields such as allMessages, recent, inputMessages, inputPrompt, existingResponses, and search on request.messages; inherited agent steps also include earlier sibling generation outputs as previousStepMessages. Aggregating run, stream, agent, composition, and flow nodes use the backend-selected final descendant generation as the representative effective request and include turns[] for the descendant generation breakdown. request.basePrompt.sourceId uses the concrete prompt id when known and otherwise reports messages.system / messages.prompt; base prompt segments come from systemBlocks, which preserve explicit segments and infer direct string-template interpolation from unambiguous primitive input values. request.modelSummary groups concrete generation models for exact, inherited, and aggregate requests, marks mixed-model aggregates, and turns[] carries per-generation model, provider, status, and promptId; flattened rows[] carries the row's effective model and provider as well. Convex Agent aggregate and per-step generation spans emit their configured languageModel so framework turns have model provenance, while nested tool-call or flow generations retain their own model identity. Clients should render these fields directly instead of walking descendant artifacts or deduping context contributions.
RunDetail also reconciles missing terminal lifecycle records from reliable canonical signals. Expired operation.deadline events mark the timed operation and its still-open ancestors as incomplete observability in the presentation tree, while future deadlines keep active long calls from being marked stale too early. Execution-changing governance rolls ancestors up to blocked, and intentional waits roll ancestors up to suspended. The canonical records remain unchanged. This is a telemetry diagnostic, not an application error.
Contextual retrieval, memory, and embedding spans fold into attached details when they are request-input evidence for a generation. Operational retrieval inside a tool, flow, composition, or agent boundary remains visible even when that branch lives under an agent generation stream; retrieval query/embed internals can still fold into the retrieval pipeline node.
Error evidence
Thrown errors are represented with a shared normalized shape. observe.span() and observe.openSpan().error() attach:
- A compact
errorsummary to the terminalspan:endorrun:end. - A
span:eventnamedexceptionwith OpenTelemetry-style attributes. - An
error.stackartifact when a stack trace exists. - An
error.rawartifact containing a bounded, redacted, JSON-safe representation of the thrown value.
The Go backend promotes span.error, error.stack, and error.raw into RunDetail.inspection.errors. Web devtools and the TUI render that section for any primitive, including tools, generation, retrieval stages, flow steps, eval cases, routing attempts, and custom spans.
Crux only uses exception evidence for actual thrown failures. Non-throwing outcomes stay in their domain-specific records: approval denial is tool.approval plus model-facing tool output, guardrail blocks are guardrail.report / guardrail.blocked, constraint retries are constraint.retry, retrieval zero hits are result counts and retrieval.hits, invalid citations are citation.report, cascade tier rejection is routing metadata, flow suspension/cancellation is flow status, and stream length or content_filter is a finish reason.
Runtime bridge and eval-runner failures can happen outside a normal span. Those responses carry the same normalized details under command.error.details, so UI and TUI surfaces can show name, message, stack, phase, kind, and raw details even when no span exists.
Exports
| Export | Purpose |
|---|---|
CruxGraphRecordSchema | Zod schema for one graph record |
CruxGraphRecordBatchSchema | Zod schema for a batch of records |
CRUX_OBSERVABILITY_SCHEMA_VERSION | Current graph schema version |
createCruxRunId() | Create a branded run id |
createCruxSpanId() | Create a branded span id |
createCruxArtifactId() | Create a branded artifact id |
normalizeObservedError() | Normalize an unknown thrown value into safe inspectable error evidence |
observedErrorSummary() | Produce the compact terminal span/run error summary |
toSafeJsonValue() | Convert unknown values into bounded, redacted JSON-safe previews |
Built-in primitives use canonical names such as generation.call, generation.stream, prompt.resolve, prompt.budget, context.predicate, context.resolve, memory.read, memory.write, retrieval.pipeline, retrieval.query, retrieval.stage, indexing.pipeline, ingest.parse, corpus.sync, embedding.call, cache.lookup, cost.record, compaction.run, eval.run, eval.case, scoring.judge, citation.check, feedback.record, workspace.operation, plan.operation, task.operation, skill.load, security.warning, routing.router, routing.cascade, fallback.attempt, constraint.check, constraint.retry, guardrail.run, agent.run, flow.run, flow.step, flow.suspension, composition.parallel, composition.pipeline, composition.consensus, composition.swarm, handoff.prepare, delegate.invoke, and tool.call.
Custom edge types and artifact kinds must use the custom.* namespace. Unknown custom values are preserved and can be displayed generically, but specialized indexing and layouts are reserved for canonical values.
Common detail artifacts use stable kinds so RunDetail.inspection can route them without UI-specific parsing:
| Artifact kind | Primary payload |
|---|---|
context.contribution | Context inclusion state, source, primitive kind, priority, tokens, cache status, exclusion reason, segmented static/dynamic text, and contributed tool names in injectedTools. |
prompt.budget | Prompt token budget totals and dropped context contribution previews. |
retrieval.hits | Query, limit, returned hits, hit ranks, source/chunk ids, scores, and pipeline stage summaries. |
citation.report / score.report | Grounding markers, optional output anchors (start, end, outputQuote), score verdicts, judge rationales, and pass/fail detail. |
composition.report | Parallel branches, pipeline stages, consensus votes, or swarm handoff path. |
handoff.payload / delegate.report | Transfer contract data, hop metadata, input/output sizes, and delegate result preview. |
constraint.report / guardrail.report | Constraint attempts/retries and guardrail action reports with before/after previews when available. |
routing.report / cache.report | Router/cascade/fallback choices, full cascade tier ladders including skipped/not-reached tiers, evaluator notes/confidence/budgets, semantic cache hits/writes, and saved work. |
compaction.report | Before/after token counts and summary previews. |
memory.snapshot / memory.recall / memory.diff | Memory state snapshots, recalled block previews (blockKind, key, preview, score), and before/after write summaries with added/removed block summaries. |
embedding.report / indexing.report / ingest.report / corpus.report | Embedding shape/cache metrics, indexing totals/stage counts, loader status, and corpus source ledger summaries. |
security.report | Prompt-injection warning severity, pattern, field/location, action, and bounded preview. |
Runtime emitters
Use observe.run() for user-facing execution roots and observe.span() for inspectable work inside a run. Standalone spans automatically create an implicit run, so helpers like pipeline, consensus, or swarm can be called directly and still produce a complete trace.
Crux's built-in orchestration helpers emit these records automatically. parallel() creates sibling agent.run spans, pipeline() records flow.step plus nested agent.run spans, flow() / withFlow() records flow.run, step children, and flow.suspension timeline markers for intentional waits, consensus() nests its voter fanout, and swarm / delegate handoffs create handoff.prepare spans, canonical input/output artifacts, handoff.payload artifacts, and relation edges.
Generation and streaming orchestration record operation deadlines when a timeout is configured. @crux/ai passes timeoutMs into core orchestration, which records timeoutMs / deadlineAt and emits operation.deadline before the provider call begins.
Prompt and safety helpers emit the same graph contract automatically:
prompt.resolve()opensprompt.resolve, records conditional context inclusion/exclusion ascontext.predicate, and records resolved context text ascontext.resolvespans withcontext.contributionartifacts. Contributions that provide tools includeinjectedTools, including dropped text contributions whose tools still participate in the request. Direct injectables, retrievers/grounding, memory, and blackboards emit tool-onlycontext.contributionpreviews when they contribute tools without resolved text. Segmented system content preservessegments: { text, dynamic, source? }[],staticTokens, anddynamicTokenson context previews, inspect parts, and budget-dropped previews. Generation spans that consume the resolved prompt link back to included context artifacts and prompt-budget artifacts, and theirmessagesartifact includes requesttoolNames; the Go read model projects those records intoRunDetailNode.request. Token-budget decisions emitprompt.budgetartifacts with dropped contribution previews.- Memory blocks and blackboards emit
memory.read/memory.writespans, attachmemory.snapshotartifacts when a read or write has inspectable state, attachmemory.recallartifacts for non-empty read result previews, attachmemory.diffartifacts for before/after write summaries, and connect those artifacts with semantic memory edges. Empty reads still emit the span but omitmemory.recall, so UIs should not render a recalled-block card for zero-result reads. Raw namespaces are represented bynamespaceHash; blackboard snapshots and diffs carrymemoryType: "blackboard". - Adapter-managed tools, including live
@crux/aitool calls, emittool.callspans, consumetool.argsartifacts, produce raw and model-facingtool.resultartifacts, and record thrownexecute()failures as rich error evidence on the same tool span while preserving the model-facing error output. Model-emitted tool intents attach to the generation astool.requestartifacts and are linked to the eventual tool execution by tool call id. Convex Agent streams expose an outergeneration.streamplus childgeneration.callspans for each AI SDK step, so tool-call turns and later text turns are visible as consecutive streamed generations. Convex Agent fallback tool-call parts follow the same contract, so stop-condition tools remain inspectable even when no handler executes. Approval request, approval, denial, and token mismatch paths emittool.approvalspans. - Retrieval calls emit
retrieval.queryspans with boundedretrieval.hitsartifacts andretrieval.returnededges. Retrieval pipelines open a parentretrieval.pipelinespan and record fanout/query/hit stages as childretrieval.stagespans with inspectable stage output artifacts. - Indexing and corpus helpers emit
indexing.pipeline,ingest.parse, andcorpus.syncspans. Document transforms, chunkers, chunk transforms, dry runs, cache outcomes, loader results, and source-ledger summaries are visible through child spans plusindexing.report,ingest.report, andcorpus.reportartifacts. - Embeddings emit
embedding.callspans with provider name, kind, dimensions, batch shape, usage, cost, governance metrics, and boundedembedding.reportartifacts. Configured embedding caches emit nestedcache.lookupspans with hit/miss/write counts and per-entry hit/miss events. - Semantic cache middleware emits
cache.lookupspans for lookup, write, and skip decisions. Hit, miss, write, and skip events carry prompt id, scope hash, version, query hash, score, result kind, and timing metadata; hit/write decisions also attachcache.reportartifacts. - Cost tracking emits
cost.recordspans with call attribution, token/cost totals, andcost.warn/cost.limitevents. Token budget checks emitprompt.budgetspans with source breakdowns and pressure levels. Compaction helpers emitcompaction.runspans andcompaction.reportartifacts with before/after token counts, compression ratios, focus, model labels, and bounded summary previews. - Quality runs emit
eval.runspans andeval.casechildren for each case/variant. LLM judges emitscoring.judgespans with boundedscore.reportartifacts. Citation validation emitscitation.checkspans with boundedcitation.reportartifacts; structured citations may carry output-text anchors (start,end, andoutputQuote) so clients can place inline superscripts without reparsing the generated answer. Feedback inbox writes emitfeedback.recordspans with bounded feedback artifacts. - Workspaces emit
workspace.operationspans for list/read/write/edit/delete operations, with namespace hashes and bounded output artifacts. Plans and tasks emitplan.operation/task.operationspans for mutations, including task create/update/remove/discard paths. File and registry skills emitskill.loadspans, and configured prompt-injection diagnostics emitsecurity.warningspans plussecurity.reportartifacts. - Router, cascade, and fallback wrappers emit
routing.router,routing.cascade, andfallback.attemptspans. Router spans record classification and selected route/model; cascade spans record attempted tiers, rejection/acceptance, budget exits, provider errors, and full ordered configured tier metadata for skipped/not-reached tiers; fallback attempt spans record model attempts, bounded error classifications, success metadata, andfallback.attemptedges between failed and next attempts. Each path attaches arouting.reportartifact for Run Detail routing cards. - The Safety session's constraint phase opens a grouped
constraint.checkspan, records each individual check as a child span, attachesconstraint.reportartifacts, and emitsconstraint.retryspans plusconstraint.retryedges when feedback triggers regeneration. - The Safety session's guardrail phases open grouped and per-guard
guardrail.runspans, attachguardrail.reportartifacts for pass, warn, redact, transform, hold, and block actions, and emitguardrail.blockededges for blocks.
import { observe } from '@crux/core/observability'
await observe.run({ name: 'support reply', rootPrimitive: 'agent.run' }, async () => {
await observe.span({ name: 'retrieve docs', family: 'retrieval', primitive: 'retrieval.query' }, async () => {
observe.event({ name: 'query.built', attributes: { terms: ['refund', 'policy'] } })
})
})The runtime uses async context propagation on Node.js and preserves parent span stacks across normal async work. For async work that crosses a boundary where context is not preserved, pass an explicit captured context:
const context = observe.captureContext()
queueMicrotask(() => {
void observe.withContext(context, async () => {
await observe.span({ name: 'delegate', family: 'delegate', primitive: 'delegate.invoke' }, async () => {
// delegated work
})
})
})Delivery
The default runtime path is non-blocking: transport failures are captured as diagnostics and never throw into user code. In serverless runtimes, await a bounded flush before returning so queued records are not killed with the request:
await observe.flush({ timeoutMs: 5000 })Observability delivery is queued into batches. The first delivery starts immediately for live devtools, and later records coalesce per microtask before being sent as bounded concurrent batches. This avoids head-of-line blocking when a collector, tunnel, or Convex network hop is slow. HTTP delivery normalizes unknown preview values into JSON-safe shapes and isolates rejected records inside a failed batch, so one malformed or oversized detail artifact does not prevent terminal lifecycle records from reaching the backend. The Go observability backend reconciles out-of-order lifecycle records by stable ids and timestamps. The runtime never blocks user code on normal emits, but flush() and shutdown() wait for pending deliveries so serverless and Convex actions can preserve span starts, ends, artifacts, and events before the worker exits.
Convex actions can use the helpers from @crux/convex:
import { withObservabilityFlush } from '@crux/convex/observability'
export const run = internalAction({
args: {},
handler: withObservabilityFlush(async (ctx, args) => {
// Crux work here
}),
})@crux/convex/server actions flush automatically by default. Pass observabilityFlushTimeoutMs: false only when an outer serverless boundary already flushes.
Use createHttpObservabilityTransport() to send canonical batches to the local Crux backend endpoint:
import { config } from '@crux/core'
config({
observability: {
serverUrl: 'http://localhost:4400',
delivery: { maxPendingDeliveries: 1000 },
},
})The observability domain is explicit export configuration. config() with no observability
block does not install telemetry, cloud upload, or raw-content capture. Use devtools only for local
UI/control/tunnel/bridge behavior.
The HTTP transport posts { records } to /api/observability/records. The Go backend owns validation, persistence, read-model building, filtering, search, and subscriptions. Devtools keeps terminal run-detail pages refreshing for a short grace window, which catches late Convex/serverless flushes that arrive after the terminal run update. Convex Agent container streams fold into details in the Go read model, while step-level streamed generation turns, tools, handoffs, and delegated flows render as chronological agent children. Promoted tool executions preserve source.canonicalParentSpanId and sort after their request generation by relation, even when Convex action timestamps arrive slightly out of order.
Live token visualization uses the same observability channel without turning every chunk into a whole-run refetch. span:event records named token.delta are persisted as canonical events and also broadcast as append-only token.delta notifications keyed by runId, spanId, and eventId.