Devtools
Visual tracing UI for generations, memory, compaction, evals, and quality experiments — setup, CLI, and dashboard navigation.
Crux's devtools give you a real-time view into every part of the pipeline — what the model saw, why it saw it, how long it took, what tools it called, and what came back. Execution observability works via fire-and-forget HTTP POST, so it's safe for ephemeral runtimes like serverless functions and adds zero overhead when disabled.
Setup
For ordinary local Quality runs, you do not need devtools config. Start the local server and run Quality from the same machine; the CLI auto-attaches to loopback devtools origins only.
crux dev
crux quality runAdd devtools config only when application runtime code should send live records to a known local
server, tunnel, or bridge endpoint:
config({
devtools: {
serverUrl: process.env.DEVTOOLS_URL,
bridge: true, // optional live-runtime command plane for local Node apps
},
})The devtools key is for local UI/control/tunnel/bridge behavior. Production telemetry and remote
collectors stay explicit through observability config or telemetry plugins.
What gets instrumented
- Every
generate()/stream()call — timing, tokens, results, finish reasons, and rich error evidence when execution throws - Every
.resolve()call — system message assembly details - Agent model calls via
@crux/ai/agent - Quality experiments — every evaluation cell opens an observed run with full trace links
- Memory operations — every read/write across all memory types
- Compaction events — sliding window evictions with compression stats
- Budget checks — pressure level transitions
- Blackboard updates — field-level change tracking
- Retrieval pipeline stages — query planning, multi-query fanout, parent expansion, compression, diversity, decay
- Handoff preparations — input/output size tracking
- Judge scores — every
llmJudgeresult with reasoning
CLI
The crux CLI is a native Go binary distributed via npm. It provides both print-and-exit commands for scripting/CI and an interactive TUI dashboard.
Server & dashboard
crux dev # start devtools server + open web UI
crux dev --tui # start server + interactive terminal dashboard
crux dev --tunnel # with public tunnel for cloud runtimes
crux dev --port 8080 # custom port
crux dev --no-open # start server without opening browserObservability commands
All commands connect to a running crux dev server. Add --json to any command for machine-readable output.
crux traces # list recent traces
crux traces <id> # trace detail (tokens, tool calls, streaming)
crux traces --live # tail traces in real-time
crux traces --prompt <id> # filter by prompt
crux stats # aggregate statistics + prompt usage
crux stats --live # continuously updating
crux index # list Project Index definitions
crux index prompts # list prompts only
crux index definitions # list all discovered definitions
crux index diagnostics # list index diagnostics
crux index reindex # force source/.crux quality reindex
crux index <id> # detail for a specific item
crux inspect <prompt-id> # token breakdown from most recent trace
crux flows # runtime flow sessions
crux quality list # discovered evaluations
crux quality run # run all Quality evaluations
crux quality show <id> # experiment detail
crux quality progress <id> # recent runs + score seriesRunning Quality evaluations
crux quality list # discover evaluations without executing
crux quality run # run every discovered evaluation
crux quality run support.refunds # run one evaluation by id
crux quality run --ci # CI mode with exit codes
crux quality show <experiment-id> --json
crux quality cell-evidence <experiment-id> --case simple-refund --variant default --trial 0 --jsonGlobal flags
--port <n> # devtools server port (default 4400)
--no-color # disable colored output (also respects NO_COLOR env)
--json # JSON output for scripting and LLM agentsWeb dashboard
Run crux dev to open the web dashboard at http://localhost:4400. Everything updates in real-time over WebSocket — no page refreshes needed.
The run inspector reads backend-owned RunDetail by default. The canonical graph remains lossless, but RunDetail classifies spans into visible nodes or attached details so low-level routing, prompt, context, memory, cache, and cost spans attach to the visible operation they explain instead of becoming misleading root rows. Explicit semantic ownership wins through explains edges and presentation.ownerSpanId; sibling chronology is only a fallback. Completion-only lifecycle fragments are retained as inspectable details rather than anonymous top-level trace rows. Convex Agent streams render as AGENT -> GENERATE stream response -> GENERATE step / TOOL ... when the stream container carries useful structure, while redundant single-step stream wrappers fold into details. Flow suspensions render as visible suspension markers rather than stuck generated spans. Attached spans remain inspectable in the detail panel, and streaming token.delta notifications let the UI append live tokens without refetching the whole run on every chunk.
The visible root should match the user's actual initiating work. Use flow() only for real Crux flows; external framework turns such as Convex Agent chat responses should open an agent.run root/span. For direct Convex Agent tools, use wrapConvexTool(tool, { name }) so tool rows show names like research, writer, or optimizeSeo instead of provider-generated call ids while nested delegates remain correctly parented.
Timeline
The main view is a live trace list showing every generate() and stream() call as it happens.
- Session grouping — traces inside
withSession()appear as collapsible groups; sessions are metadata/grouping, not execution nodes - Flow nesting —
flow().run()adds sub-groups with labeled steps within sessions - Status indicators — running (spinner), success (green), error (red)
- Filtering — filter by prompt ID, model, or session using the search bar
- Click any trace to open its detail view
Standalone traces (outside any session) appear as individual rows. Most recent traces appear at the top.
Trace detail
Click a trace in the timeline to see its full execution:
- Token breakdown — system, prompt, completion tokens with per-context attribution
- Timing — total duration; time-to-first-token (TTFT) for streaming calls
- Model & provider — which model handled the call
- Input & output — the resolved prompt messages and the model's response
- Tool calls — each native-adapter or
@crux/ailive invocation with arguments, raw result preview, model-facingtoModelOutput()preview, duration, and output-size savings - Source location — file, line number, and function name where
generate()was called - Streaming metrics — throughput (tokens/sec), chunk count, stream duration
- Errors — compact message, error type, stack trace, and raw safe details when a call, tool, stage, step, or runtime command throws
Stats panel
Aggregate metrics across all traces in the current session:
- Overview — total traces, success/error counts, average duration, error rate
- Tokens & cost — total and average token usage, cumulative cost
- Top prompts — most-called prompts ranked by invocation count
- Cost by model — visual bars showing spend distribution across models
- Latency percentiles — P50, P90, P99 response times
- Streaming — average TTFT, throughput across streaming traces
- Activity — memory read/write counts, retrieval stage counts/errors, handoffs, delegates
Project Index
Browse discovered prompts, contexts, tools, suites, evals, source locations, snippets, authored paths, relations, and diagnostics before the first run. crux dev rebuilds this Go-owned read model from source files and .crux/quality JSON on startup; runtime snapshots only enrich it. Use definition.source.file for file-tree grouping and definition.path for user-authored prompt/context/tool tree grouping. Memory and blackboard definitions include authored store bindings and Zod schemas when the indexer can safely resolve them, including workingState({ schema }) blocks nested in a memory({ blocks }) definition. definition.metadata.runtimeJoin is backend guidance for joining authored definitions to runtime spans: flow and flow-step joins use primitive/name or stepLabel, while runtime flowId and stepId remain execution correlation fields; memory blocks and blackboards join through the memory span shape. Click any item to inspect source metadata and relations. Use crux index reindex when you want to force a rebuild from the CLI.
While crux dev is running, source and config changes are picked up by the local watcher. Normal edits to known prompt, context, tool, suite, eval, and dependency-closure source files are planned incrementally, committed as AST/source patches first, and then enriched semantically in the background. Broad resolver changes such as crux.config.*, package.json, lockfiles, workspace files, unsafe deletions, unknown files, and incomplete graph evidence intentionally fall back to a full reindex. Inspect the latest watch decision at GET /api/project/index/watch; the response includes the run id, plan kind, fallback reason, affected counts, patch counts, queue coalescing counts, semantic status, and bounded phase timings.
The Project Index remains the authored/source plane. Runtime contribution evidence is available as a separate local read model at GET /api/project/index/observed-injection?limit=250, which aggregates recent context.contribution, prompt.budget, and redacted prompt.input artifacts into observed source ids, branch labels, injected tools, budget drops, prompt ids, run refs, static/runtime comparison evidence, and prompt input-key validation summaries. UIs can use it to compare possible injection with observed injection without treating recent traces as the authored contract.
Quality assets
The Quality screens merge authored assets with local workbench state. Code and JSON suites named *.suite.* appear in the suites/datasets views as soon as the Project Index is ready, even before an experiment has been run. Committed cassette fixtures named *.cassette.json appear in the cassettes view alongside local .crux/quality/cassettes records. Local .crux/quality suite records take precedence for the same suite id, so annotations or UI-created cases are not overwritten by source discovery.
Memory explorer
The Memory nav item opens a dedicated view for browsing memory activity and inspectable snapshots. Runtime memory and blackboard reads/writes come from canonical memory.read / memory.write spans, memory.snapshot artifacts, and semantic memory edges. The Go backend joins those records with Project Index definitions to expose store details, schemas, source links, read/write trends, and cross-store operation history when the runtime captured the relevant metadata. Span start/end attributes are merged by span id, so terminal read/write measurements do not erase resource identity fields such as memoryId, memoryType, blockId, or blockKind. When Runtime Bridge live inspection is available, the same memory detail response includes inspection.status: "ok", inspection.source: "mixed", and live store entries. When the bridge is unavailable, ambiguous, or fails, the response keeps the projected state and includes inspection.status: "partial" with a reason, message, and docs link. Missing optional fields mean "not captured yet", not zero or empty.
- Current state — type-specific visualization: field-level view for working-state blocks, entry tables for fact/episode/procedure blocks, JSON tree for blackboard fields
- History — chronological event log with inline state diffs
A time-travel slider lets you scrub through historical writes to see state at any point in time.
Durable inventories of existing memory stores, workspaces, or plans belong to the index/state plane. Runtime touches of those resources still appear in the observability graph. The library views read GET /api/observability/resources/:family, which is assembled by the Go backend from spans, artifacts, and edges. That keeps memory snapshots, workspace outputs, plan previews, and task mutations inspectable while the web UI and TUI remain dumb readers.
Quality workbench
When you run crux quality run, experiment records appear in the Quality workbench: a run list with status, case counts, pass/fail state, duration, score summaries, and expanded evidence for failed cells. The cell drawer reads backend-owned evidence records, including assertion outcomes, normalized checks, source frames, baseline deltas, and trace hotspots when the run retained those details.
Security tab
Detected injection attempts across your prompts. Pattern frequency cards (clickable filters), an attack timeline correlated to traces, input previews with matched patterns highlighted, and a vulnerability matrix showing which prompts receive which types of attacks. Security warnings emit automatically when securityWarnings is enabled (default in development).
Terminal dashboard (TUI)
For a terminal-based alternative, use crux dev --tui. Split-pane: live traces on the left, aggregate stats on the right. Tunnel URL appears in the header bar when a tunnel is active.
| Key | Action |
|---|---|
j / k | Navigate up/down |
d / u | Half-page jump |
/ | Filter traces |
Enter | Inspect selected trace |
c | Browse index |
Tab | Switch index tab (prompts/contexts/tools) |
Esc | Go back |
q | Quit |
How it works
The devtools architecture has two sides — the CLI that runs the Go server, and the SDK that sends canonical graph records to it.
The server
The crux CLI is a native Go binary. It owns the HTTP API, WebSocket/SSE subscriptions, SQLite persistence, graph/resource read-model assembly, the TUI, and the embedded web UI assets. File-backed observability stores use SQLite WAL, a busy timeout, and a small connection pool so multiple in-flight flushes and UI reads can run concurrently; in-memory stores remain single-connection for tests.
When you run crux dev:
- The CLI starts the Go HTTP server on the selected port
- Opens the embedded web dashboard, or starts the Go TUI with
--tui - Persists canonical graph records to the local
.cruxSQLite store - Builds the Project Index by indexing source files and
.crux/qualityJSON with a bounded embedded Node worker - Builds canonical graph,
RunDetail, and resource activity read models in Go services - Broadcasts backend-owned read-model updates to connected UIs
The Go process owns the server and read models. Node.js is spawned only for bounded helper workers: Project Index indexing, eval execution, and lazy source resolution.
The SDK
Your application connects to the devtools server through config() or enableDevtools(). When serverUrl is set, the SDK installs the canonical observability transport, posts ordered graph record batches to POST /api/observability/records, and sends runtime prompt/context/tool snapshots through POST /api/index/snapshot to enrich the Project Index. Delivery is non-blocking, starts immediately for live span updates, can have multiple bounded HTTP deliveries in flight, and swallows failures so devtools never slow down or break application code. The Go server keeps list/dashboard/lifecycle reads on cheap run-summary paths and reserves exact graph projection for single-run inspection. Generation streams close their spans from the raw stream's completion or error signal, so stopped streams and failed streams do not stay visually running while later usage metadata is collected. In serverless runtimes, call observe.flush() or use the @crux/convex helpers before the action returns so queued records are not killed with the worker.
The optional Runtime Bridge is the reverse command plane. With devtools.bridge: true, a long-lived Node runtime opens /ws/runtime so Go services can send typed local-dev commands such as store.read to the live app. Primitives register inspectable resources automatically, so devtools can ask for memory:<id> or blackboard:<id> and read the attached store instead of reconstructing state from traces. eval.run is also routed through the Go bridge service, using the embedded eval runner so quality persistence and observability stay on the normal eval path. Convex and serverless runtimes use framework-bound HTTP endpoints instead of long-lived sockets. Web devtools and the TUI still talk only to Go: they never call runtime peers directly.
Failed spans expose an errors inspection section built by the Go backend from span.error, exception, error.stack, and error.raw records. Devtools renders that section above primitive-specific output, so a failed tool shows the error before args/results, and a failed generation or retrieval stage can be inspected without switching to raw graph JSON. Status outcomes that are not thrown errors remain in their own sections: approval denials, guardrail blocks, constraint retries, retrieval zero hits, cascade tier rejections, flow suspensions, and stream finish reasons are visible but not mislabeled as exceptions.
For Convex, bind the bridge once from convex/http.ts:
import { setup } from '@crux/convex'
import crux from '../crux.config'
import { components } from './_generated/api'
setup(http, crux, {
component: components.crux,
})During crux dev, the Go server auto-discovers that HTTP peer from CRUX_BRIDGE_URL, CONVEX_SITE_URL, or Convex cloud URL variables such as CONVEX_URL / NEXT_PUBLIC_CONVEX_URL in the shell or project .env.local / .env. If only a .convex.cloud URL is present, Crux derives the matching .convex.site/crux/bridge URL before fetching the manifest.
For user-facing resource inspection, clients call Go's resource APIs instead of bridge endpoints:
GET /api/resources/capabilitiesGET /api/resources/{resourceId}GET /api/resources/{resourceId}/entries
The response is product-shaped: ok for data, partial when only projections are available, unavailable when a bridge/runtime is required, or error when a live command failed. Capabilities are UI hints for showing actions, not guarantees; every resource read still returns a structured reason and docs link when it cannot be served. Product screens should prefer domain read models when they exist: the Memory screen reads GET /api/memory/stores/{id} and uses its embedded inspection object instead of composing generic resource calls in the client.
Run lists and run details are intentionally different read paths. The web UI and TUI list newest runs from a bounded page and the backend only performs cheap count/identity enrichment there. A single run detail then reads the graph tables needed for the exact RunDetail projection. Raw record payloads are available through the graph/debug route, not the normal detail route.
Requirements
@crux/local— npm package that ships thecruxlocal runtime wrapper