@crux/local

Local Crux runtime for the Go devtools server, TUI, traces, Quality, index, and lint.

# Install
npm install -g @crux/local
# or download the binary from the releases page

Overview

The Crux local runtime is a Go binary that hosts the HTTP API, WebSocket/SSE subscriptions, React devtools UI, TUI, SQLite-backed observability services, index services, Quality run orchestration, and lint presentation. File-backed observability databases use SQLite WAL, busy timeouts, and pooled connections for concurrent flush/read workloads; in-memory databases stay single-connection for tests. Node.js is used only for bounded helper workers such as Project Index indexing, source resolution, and Quality evaluation execution.

All query commands support a --json flag for machine-readable output. Table output is the default for human consumption.

`crux dev`

Start the devtools server and open the web UI.

crux dev
crux dev --port 4500
crux dev --tui            # Interactive terminal dashboard
crux dev --tunnel         # Create a public tunnel
crux dev --no-open        # Don't open the browser

Flag	Default	Description
`--port`	`4400`	Server port
`--tunnel`	`false`	Create a public tunnel for remote access
`--no-open`	`false`	Don't automatically open the browser
`--tui`	`false`	Show interactive terminal dashboard (Bubbletea TUI)
`--startup-debug`	`false`	Show startup timing diagnostics

The Go server owns HTTP, WebSocket/SSE, read models, and UI serving. If the port is already in use, it automatically tries the next available port (up to +9).

Requires: Node.js >= 24 for helper workers that import project TypeScript.

The local HTTP API, in-process DirectClient, and WebSocket snapshot payloads share the same backend read-model registry. This keeps route parameters, query parsing, not-found handling, live invalidation messages, and initial snapshot payloads aligned between custom tooling and the embedded devtools UI.

Observability and quality run lists are newest-first pages by default so live web/TUI refreshes stay responsive on large local histories. Single-run detail endpoints remain exact. Use explicit limit/offset query parameters on the HTTP list endpoints when building custom local tooling that needs a different page size.

`crux traces`

List recent traces or show trace detail.

crux traces                        # List all recent traces
crux traces <trace-id>             # Show full trace detail
crux traces --prompt editor.seo    # Filter by prompt ID
crux traces --session abc123       # Filter by session ID
crux traces --live                 # Tail traces in real-time
crux traces --json                 # Output as JSON

Flag	Description
`--prompt`	Filter by prompt ID
`--session`	Filter by session ID
`--live`	Tail traces in real-time via WebSocket
`--json`	Output as JSON

Table columns: TIME, STATUS, PROMPT, MODEL, DURATION, TOKENS, COST

Detail view (with trace ID argument) shows: trace ID, provider, duration, session, role, error, token breakdown (input/output/cache), cost, fallback attempts, streaming metrics (TTFT, throughput), and tool calls.

`crux stats`

Show aggregate statistics across all traces.

crux stats              # Show current stats
crux stats --live       # Continuously update on new events
crux stats --json       # Output as JSON

Flag	Description
`--live`	Continuously update stats via WebSocket
`--json`	Output as JSON

Shows: execution counts (success/error/running), average duration, error rate, total tokens, total cost, average cost per call, streaming metrics, memory read/write counts, compaction counts, budget level, agent coordination stats (handoffs, delegates, blackboard updates), tool call counts, and security warnings.

Also displays a per-prompt usage breakdown table with call counts, errors, average duration, and total cost.

`crux cost`

Show tracked model spend from the latest cost report.

crux cost            # Show the cost breakdown
crux cost --json     # Output as JSON

Flag	Description
`--json`	Output as JSON

Prints the total spend and event count, then per-model and per-prompt breakdowns, followed by any threshold warn/limit events. Requires withCostTracking() in your Crux config — without it, no cost events are recorded and the command says so.

`crux config`

Render the effective Crux configuration: every domain config() accepts, with each value resolved and tagged by where it came from. crux config inspect imports your crux.config.ts in inert CRUX_INDEX=1 mode (no runtime side effects — no observability transport, no bridge, no store) so the view reflects your explicit overrides, not just defaults.

crux config inspect
crux config inspect --json
crux config inspect --cwd packages/backend
crux config inspect --config crux.config.ts

Flag	Description
`--json`	Output the full effective configuration as JSON
`--cwd`	Project root to inspect (default: nearest config/package)
`--config`	Crux config path relative to the project root
`--name`	Project name to include in worker resolution

The view is organized by config() domain, in interface order:

Project — root and package name
Config file — the resolved file, how it was located, its load status, and any import error
quality: — id, dir, include, exclude, redact, and the run defaults trials / concurrency / timeoutMs / replay (the full QualityConfig surface)
generation: — autoEscape, securityWarnings, and whether a tokenizer / middleware is bound
indexer: — extension trust mode and configured extensions
observability: — enabled, serverUrl, and whether a custom transport is bound
devtools: — serverUrl and whether a bridge is configured
persistence: — whether a global store is bound
lint: — selected profile and rule-override count
plugins: — installed plugin names
Discovered — a compact count of authored definitions / relations / evaluations (context, not config; the full index lives behind crux index)
Diagnostics — config-load and project-model findings

Every value carries a dim origin tag so a zero-config project reads as "these are the defaults Crux applied": (default) for a built-in default, (config) for an explicit value from crux.config.ts, (package.json) for package-derived values, and (set) for a bound non-serializable value (store, tokenizer, middleware, transport). The config-file status uses a colored glyph (✓ loaded, ✗ missing/import-failed, ● unrecognized). Paths are normalized for readability: the project root collapses to ~ and every other path is shown relative to it (.crux/quality, evals/...).

On an interactive terminal an animated loader is shown while the config resolves; color and the loader are suppressed under --no-color, NO_COLOR, or a non-TTY pipe, and --json emits the machine-readable form (each setting as { value, origin }, full absolute paths preserved). Worker lifecycle logs are silenced for this one-shot command — set CRUX_STARTUP_DEBUG=1 to surface them when troubleshooting.

A broken config never makes inspection fail: an import error degrades to an all-defaults view with an import-failed config status and the error surfaced under Diagnostics.

`crux index`

List Project Index definitions from the Go devtools service.

crux index                  # List all
crux index prompts          # List prompts only
crux index contexts         # List contexts only
crux index tools            # List tools only
crux index definitions      # List all discovered definitions
crux index diagnostics      # List index diagnostics
crux index reindex          # Rebuild from source files and .crux/quality JSON
crux index editor.seo       # Show detail for a specific item
crux index --json           # Output as JSON

Flag	Description
`--json`	Output as JSON

When called with a specific item ID, shows detailed information including description, tags, source, path, associated contexts, and Project Index metadata when available. crux dev indexes source files and .crux/quality JSON at startup, then watches source/config files for debounced incremental index refreshes. Runtime prompt/context/tool snapshots enrich the same backend-owned read model.

`crux inspect`

Show token breakdown for a prompt from recent traces.

crux inspect <prompt-id>     # e.g., crux inspect editor.seo
crux inspect editor.seo --json

Flag	Description
`--json`	Output as JSON

Finds the most recent successful trace for the given prompt ID and displays: system parts breakdown (source and token count per part), dropped contexts (with priority), and total prompt tokens.

Inspect data is captured when the devtools middleware is enabled. Run the prompt at least once before inspecting.

`crux lint`

Check authored Crux project health against a lint profile.

crux lint                          # Run the recommended profile
crux lint --profile strict         # Use a stricter profile
crux lint --fail-on warning        # Exit 1 if any warning (or error) is found
crux lint --include-suppressed     # Also show source-comment suppressed findings
crux lint --json                   # Output as JSON

Flag	Description
`--profile <name>`	Lint profile: `off`, `recommended` (default), `strict`, `experimental`
`--fail-on <severity>`	Exit `1` when selected findings include `error`, `warning`, or `info`
`--include-suppressed`	Include source-comment suppressed findings
`--json`	Output as JSON

Lists findings sorted by severity (error · warning · info), each with its rule id, target definition, source file:line, and — when available — the problem, rationale, suggested fix, and docs link. With --fail-on, a matching finding sets exit code 1 so the command can gate CI.

`crux quality`

Run, watch, list, inspect, and promote Quality evaluations (*.eval.ts files plus colocated prompt({ tests }) when a prompt registry is available). File-defined evaluations are discovered by convention; crux.config.ts is optional policy/defaults, not a required primitive registry. See the Quality guide.

crux quality run [id...]            # Run all discovered evaluations, or the listed ids
crux quality watch [id...]          # Incremental re-run on change (also embedded in `crux dev`)
crux quality list                   # Discovered evaluations (manifests), no execution
crux quality show <experimentId>    # Print one persisted experiment record (human or --json)
crux quality progress <evaluationId> # Recent runs and score series for one evaluation
crux quality cell-evidence <experimentId> --case <caseId> --variant <name> --trial <n>
crux quality promote <experimentId> [--variant <name>] [--pin-id <id>]

Quality is the supported evaluation CLI namespace.

`crux quality run` flags

Flag	Meaning
`[id...]`	Evaluation ids (explicit or path-derived). Unknown id → exit 2 with a nearest-match suggestion
`--case <pattern>`	Filter cases by id/name; `` glob; repeatable. Demotes gates to informational*
`--variant <name>`	Run a variant subset; repeatable. Demotes gates unless the baseline variant is included
`--trials <n>`	Override trials for this run
`--replay <mode>`	`live` · `record-new` · `replay-strict` · `refresh` — overrides declaration/config
`--rescore`	Reuse cached outputs, re-run scorers/expects only (token-free judge iteration)
`--experiment <label>`	Grouping label stored on the record(s)
`--json`	Write the Experiment record(s) (versioned schema) as a JSON array to stdout (bool, consistent with every other quality subcommand)
`--json-out <path>`	Write the same JSON array to a file path instead of stdout
`--junit <path>`	Write JUnit XML (one testsuite per evaluation × variant; gate failures as a synthetic `gates` testcase; experimentId/configFingerprint/score means in properties)
`--ci`	Force plain, non-animated output and no color, even on a TTY. Pair with explicit `--replay replay-strict` for token-free CI posture
`--max-concurrency <n>`	Cap parallel cells across all evaluations
`--quiet` / `--verbose`	Reporter detail level

Exit codes

Code	Meaning
`0`	All blocking gates passed (or a filtered run with no errored cells)
`1`	A blocking gate failed, an expect failed under the default policy, or any cell errored
`2`	Definition/discovery error — unknown id, duplicate ids, dataset schema failure, missing `generate` for a model-backed task, invalid config. Nothing was executed

The reporter prints a per-evaluation tree with per-cell status; failures show the assertion diff, its file:line source ref, and the devtools trace deep link. Deltas vs the baseline always render with ±SEM. Filtered runs print gates informational (filtered run); replay-strict misses print the missing cassette key and the re-record command.

Records on disk

Experiment records persist to .crux/quality/experiments/ (gitignored — the runner scaffolds the .gitignore); promoted baselines live in .crux/quality/baselines/ and cassettes in .crux/quality/cassettes/ — both committed.

`crux quality watch`

Re-collects on changes to eval files, their imports, datasets, or cassettes. Unchanged cells (same case identity, task fingerprint, params, and replay mode) are served from the output cache and re-scored only — so a scorer or expect edit reruns instantly without re-executing tasks. evaluate.only / case only narrow the watch set. The same engine is embedded in crux dev, streaming results to devtools.

`crux quality progress`

Shows the backend-owned progress read model for one evaluation: recent experiment rows, status counts, score series, and baseline state. Use --json when feeding the result to agents or custom dashboards.

The same local backend also serves evaluation-to-experiment relation reads for devtools: one endpoint returns recent experiment summaries for a single evaluation, and another returns experiment buckets grouped by evaluation. UI clients should consume those read models instead of rebuilding the relation from the flat experiment list.

crux quality progress support.refunds
crux quality progress support.refunds --limit 20 --json

`crux quality cell-evidence`

Prints the evidence record for one experiment cell. The record joins the persisted experiment, assertion ledger, score-threshold checks, source frame, baseline comparison, and retained trace hotspots when available.

crux quality cell-evidence <experimentId> --case simple-refund --variant default --trial 0
crux quality cell-evidence <experimentId> --case simple-refund --variant default --trial 0 --json

`crux quality promote`

Promote an experiment to the committed baseline; every subsequent run of that evaluation auto-compares against it and minDeltaVsBaseline gates become evaluable:

crux quality promote <experimentId>                  # promotes the declared baseline variant
crux quality promote <experimentId> --variant cheap  # promote a specific variant
crux quality promote <experimentId> --pin-id support.refunds  # pin a path-derived id

Promotion requires an explicit evaluation id (use --pin-id and add the printed one-line id to your evaluate() call — the CLI never rewrites source) and refuses filtered runs: paired statistics need the full case population. Promote failures are usage errors (exit 2, nothing written).

`crux flows`

List runtime flow sessions.

crux flows            # List all flow sessions
crux flows --json     # Output as JSON

Flag	Description
`--json`	Output as JSON

Table columns: TIME, NAME, STATUS, SESSION

`crux completion`

Generate a shell completion script for crux. Pipe it into your shell's completion system, or source it from your shell profile.

crux completion bash         # Print a bash completion script
crux completion zsh          # Print a zsh completion script
crux completion fish         # Print a fish completion script
crux completion powershell   # Print a PowerShell completion script

# Example: load zsh completions for the current session
source <(crux completion zsh)

Run crux completion <shell> --help for shell-specific installation steps.

Configuration

The CLI discovers the devtools server by connecting to localhost on the configured port (default 4400). Quality auto-attach accepts CRUX_DEVTOOLS_URL only when it points at a loopback devtools origin such as localhost, 127.0.0.1, or [::1]; remote observability and cloud upload targets must be configured explicitly by project code. The Quality runner resolves a project root from --cwd, --config, the nearest config file, the nearest package root, or the current working directory.

Set the CRUX_STARTUP_DEBUG=1 environment variable to show startup timing diagnostics without the --startup-debug flag.

Guide: Devtools
Guide: Quality

@crux/local

On this page