@crux/local
Local Crux runtime for the Go devtools server, TUI, traces, Quality, index, and lint.
# Install
npm install -g @crux/local
# or download the binary from the releases pageOverview
The Crux local runtime is a Go binary that hosts the HTTP API, WebSocket/SSE subscriptions, React devtools UI, TUI, SQLite-backed observability services, index services, Quality run orchestration, and lint presentation. File-backed observability databases use SQLite WAL, busy timeouts, and pooled connections for concurrent flush/read workloads; in-memory databases stay single-connection for tests. Node.js is used only for bounded helper workers such as Project Index indexing, source resolution, and Quality evaluation execution.
All query commands support a --json flag for machine-readable output. Table output is the default for human consumption.
crux dev
Start the devtools server and open the web UI.
crux dev
crux dev --port 4500
crux dev --tui # Interactive terminal dashboard
crux dev --tunnel # Create a public tunnel
crux dev --no-open # Don't open the browser| Flag | Default | Description |
|---|---|---|
--port | 4400 | Server port |
--tunnel | false | Create a public tunnel for remote access |
--no-open | false | Don't automatically open the browser |
--tui | false | Show interactive terminal dashboard (Bubbletea TUI) |
--startup-debug | false | Show startup timing diagnostics |
The Go server owns HTTP, WebSocket/SSE, read models, and UI serving. If the port is already in use, it automatically tries the next available port (up to +9).
Requires: Node.js >= 24 for helper workers that import project TypeScript.
The local HTTP API, in-process DirectClient, and WebSocket snapshot payloads share the same backend read-model registry. This keeps route parameters, query parsing, not-found handling, live invalidation messages, and initial snapshot payloads aligned between custom tooling and the embedded devtools UI.
Observability and quality run lists are newest-first pages by default so live web/TUI refreshes stay responsive on large local histories. Single-run detail endpoints remain exact. Use explicit limit/offset query parameters on the HTTP list endpoints when building custom local tooling that needs a different page size.
crux traces
List recent traces or show trace detail.
crux traces # List all recent traces
crux traces <trace-id> # Show full trace detail
crux traces --prompt editor.seo # Filter by prompt ID
crux traces --session abc123 # Filter by session ID
crux traces --live # Tail traces in real-time
crux traces --json # Output as JSON| Flag | Description |
|---|---|
--prompt | Filter by prompt ID |
--session | Filter by session ID |
--live | Tail traces in real-time via WebSocket |
--json | Output as JSON |
Table columns: TIME, STATUS, PROMPT, MODEL, DURATION, TOKENS, COST
Detail view (with trace ID argument) shows: trace ID, provider, duration, session, role, error, token breakdown (input/output/cache), cost, fallback attempts, streaming metrics (TTFT, throughput), and tool calls.
crux stats
Show aggregate statistics across all traces.
crux stats # Show current stats
crux stats --live # Continuously update on new events
crux stats --json # Output as JSON| Flag | Description |
|---|---|
--live | Continuously update stats via WebSocket |
--json | Output as JSON |
Shows: execution counts (success/error/running), average duration, error rate, total tokens, total cost, average cost per call, streaming metrics, memory read/write counts, compaction counts, budget level, agent coordination stats (handoffs, delegates, blackboard updates), tool call counts, and security warnings.
Also displays a per-prompt usage breakdown table with call counts, errors, average duration, and total cost.
crux cost
Show tracked model spend from the latest cost report.
crux cost # Show the cost breakdown
crux cost --json # Output as JSON| Flag | Description |
|---|---|
--json | Output as JSON |
Prints the total spend and event count, then per-model and per-prompt
breakdowns, followed by any threshold warn/limit events. Requires
withCostTracking() in your Crux config — without it, no cost events are
recorded and the command says so.
crux config
Render the effective Crux configuration: every domain config() accepts, with each value resolved
and tagged by where it came from. crux config inspect imports your crux.config.ts in inert
CRUX_INDEX=1 mode (no runtime side effects — no observability transport, no bridge, no store) so the
view reflects your explicit overrides, not just defaults.
crux config inspect
crux config inspect --json
crux config inspect --cwd packages/backend
crux config inspect --config crux.config.ts| Flag | Description |
|---|---|
--json | Output the full effective configuration as JSON |
--cwd | Project root to inspect (default: nearest config/package) |
--config | Crux config path relative to the project root |
--name | Project name to include in worker resolution |
The view is organized by config() domain, in interface order:
Project— root and package nameConfig file— the resolved file, how it was located, its load status, and any import errorquality:—id,dir,include,exclude,redact, and the run defaultstrials/concurrency/timeoutMs/replay(the fullQualityConfigsurface)generation:—autoEscape,securityWarnings, and whether atokenizer/middlewareis boundindexer:— extensiontrustmode and configuredextensionsobservability:—enabled,serverUrl, and whether a customtransportis bounddevtools:—serverUrland whether abridgeis configuredpersistence:— whether a globalstoreis boundlint:— selectedprofileand rule-override countplugins:— installed plugin namesDiscovered— a compact count of authored definitions / relations / evaluations (context, not config; the full index lives behindcrux index)Diagnostics— config-load and project-model findings
Every value carries a dim origin tag so a zero-config project reads as "these are the defaults Crux
applied": (default) for a built-in default, (config) for an explicit value from crux.config.ts,
(package.json) for package-derived values, and (set) for a bound non-serializable value (store,
tokenizer, middleware, transport). The config-file status uses a colored glyph (✓ loaded,
✗ missing/import-failed, ● unrecognized). Paths are normalized for readability: the project root
collapses to ~ and every other path is shown relative to it (.crux/quality, evals/...).
On an interactive terminal an animated loader is shown while the config resolves; color and the loader
are suppressed under --no-color, NO_COLOR, or a non-TTY pipe, and --json emits the machine-readable
form (each setting as { value, origin }, full absolute paths preserved). Worker lifecycle logs are
silenced for this one-shot command — set CRUX_STARTUP_DEBUG=1 to surface them when troubleshooting.
A broken config never makes inspection fail: an import error degrades to an all-defaults view with an
import-failed config status and the error surfaced under Diagnostics.
crux index
List Project Index definitions from the Go devtools service.
crux index # List all
crux index prompts # List prompts only
crux index contexts # List contexts only
crux index tools # List tools only
crux index definitions # List all discovered definitions
crux index diagnostics # List index diagnostics
crux index reindex # Rebuild from source files and .crux/quality JSON
crux index editor.seo # Show detail for a specific item
crux index --json # Output as JSON| Flag | Description |
|---|---|
--json | Output as JSON |
When called with a specific item ID, shows detailed information including description, tags, source, path, associated contexts, and Project Index metadata when available. crux dev indexes source files and .crux/quality JSON at startup, then watches source/config files for debounced incremental index refreshes. Runtime prompt/context/tool snapshots enrich the same backend-owned read model.
crux inspect
Show token breakdown for a prompt from recent traces.
crux inspect <prompt-id> # e.g., crux inspect editor.seo
crux inspect editor.seo --json| Flag | Description |
|---|---|
--json | Output as JSON |
Finds the most recent successful trace for the given prompt ID and displays: system parts breakdown (source and token count per part), dropped contexts (with priority), and total prompt tokens.
Inspect data is captured when the devtools middleware is enabled. Run the prompt at least once before inspecting.
crux lint
Check authored Crux project health against a lint profile.
crux lint # Run the recommended profile
crux lint --profile strict # Use a stricter profile
crux lint --fail-on warning # Exit 1 if any warning (or error) is found
crux lint --include-suppressed # Also show source-comment suppressed findings
crux lint --json # Output as JSON| Flag | Description |
|---|---|
--profile <name> | Lint profile: off, recommended (default), strict, experimental |
--fail-on <severity> | Exit 1 when selected findings include error, warning, or info |
--include-suppressed | Include source-comment suppressed findings |
--json | Output as JSON |
Lists findings sorted by severity (error · warning · info), each with its
rule id, target definition, source file:line, and — when available — the
problem, rationale, suggested fix, and docs link. With --fail-on, a matching
finding sets exit code 1 so the command can gate CI.
crux quality
Run, watch, list, inspect, and promote Quality evaluations (*.eval.ts files
plus colocated prompt({ tests }) when a prompt registry is available).
File-defined evaluations are discovered by convention; crux.config.ts is
optional policy/defaults, not a required primitive registry. See the
Quality guide.
crux quality run [id...] # Run all discovered evaluations, or the listed ids
crux quality watch [id...] # Incremental re-run on change (also embedded in `crux dev`)
crux quality list # Discovered evaluations (manifests), no execution
crux quality show <experimentId> # Print one persisted experiment record (human or --json)
crux quality progress <evaluationId> # Recent runs and score series for one evaluation
crux quality cell-evidence <experimentId> --case <caseId> --variant <name> --trial <n>
crux quality promote <experimentId> [--variant <name>] [--pin-id <id>]Quality is the supported evaluation CLI namespace.
crux quality run flags
| Flag | Meaning |
|---|---|
[id...] | Evaluation ids (explicit or path-derived). Unknown id → exit 2 with a nearest-match suggestion |
--case <pattern> | Filter cases by id/name; * glob; repeatable. Demotes gates to informational |
--variant <name> | Run a variant subset; repeatable. Demotes gates unless the baseline variant is included |
--trials <n> | Override trials for this run |
--replay <mode> | live · record-new · replay-strict · refresh — overrides declaration/config |
--rescore | Reuse cached outputs, re-run scorers/expects only (token-free judge iteration) |
--experiment <label> | Grouping label stored on the record(s) |
--json | Write the Experiment record(s) (versioned schema) as a JSON array to stdout (bool, consistent with every other quality subcommand) |
--json-out <path> | Write the same JSON array to a file path instead of stdout |
--junit <path> | Write JUnit XML (one testsuite per evaluation × variant; gate failures as a synthetic gates testcase; experimentId/configFingerprint/score means in properties) |
--ci | Force plain, non-animated output and no color, even on a TTY. Pair with explicit --replay replay-strict for token-free CI posture |
--max-concurrency <n> | Cap parallel cells across all evaluations |
--quiet / --verbose | Reporter detail level |
Exit codes
| Code | Meaning |
|---|---|
0 | All blocking gates passed (or a filtered run with no errored cells) |
1 | A blocking gate failed, an expect failed under the default policy, or any cell errored |
2 | Definition/discovery error — unknown id, duplicate ids, dataset schema failure, missing generate for a model-backed task, invalid config. Nothing was executed |
The reporter prints a per-evaluation tree with per-cell status; failures show
the assertion diff, its file:line source ref, and the devtools trace deep
link. Deltas vs the baseline always render with ±SEM. Filtered runs print
gates informational (filtered run); replay-strict misses print the missing
cassette key and the re-record command.
Records on disk
Experiment records persist to .crux/quality/experiments/ (gitignored — the
runner scaffolds the .gitignore); promoted baselines live in
.crux/quality/baselines/ and cassettes in .crux/quality/cassettes/ — both
committed.
crux quality watch
Re-collects on changes to eval files, their imports, datasets, or cassettes.
Unchanged cells (same case identity, task fingerprint, params, and replay
mode) are served from the output cache and re-scored only — so a scorer or
expect edit reruns instantly without re-executing tasks. evaluate.only /
case only narrow the watch set. The same engine is embedded in crux dev,
streaming results to devtools.
crux quality progress
Shows the backend-owned progress read model for one evaluation: recent
experiment rows, status counts, score series, and baseline state. Use --json
when feeding the result to agents or custom dashboards.
The same local backend also serves evaluation-to-experiment relation reads for devtools: one endpoint returns recent experiment summaries for a single evaluation, and another returns experiment buckets grouped by evaluation. UI clients should consume those read models instead of rebuilding the relation from the flat experiment list.
crux quality progress support.refunds
crux quality progress support.refunds --limit 20 --jsoncrux quality cell-evidence
Prints the evidence record for one experiment cell. The record joins the persisted experiment, assertion ledger, score-threshold checks, source frame, baseline comparison, and retained trace hotspots when available.
crux quality cell-evidence <experimentId> --case simple-refund --variant default --trial 0
crux quality cell-evidence <experimentId> --case simple-refund --variant default --trial 0 --jsoncrux quality promote
Promote an experiment to the committed baseline; every subsequent run of that
evaluation auto-compares against it and minDeltaVsBaseline gates become
evaluable:
crux quality promote <experimentId> # promotes the declared baseline variant
crux quality promote <experimentId> --variant cheap # promote a specific variant
crux quality promote <experimentId> --pin-id support.refunds # pin a path-derived idPromotion requires an explicit evaluation id (use --pin-id and add the
printed one-line id to your evaluate() call — the CLI never rewrites
source) and refuses filtered runs: paired statistics need the full case
population. Promote failures are usage errors (exit 2, nothing written).
crux flows
List runtime flow sessions.
crux flows # List all flow sessions
crux flows --json # Output as JSON| Flag | Description |
|---|---|
--json | Output as JSON |
Table columns: TIME, NAME, STATUS, SESSION
crux completion
Generate a shell completion script for crux. Pipe it into your shell's
completion system, or source it from your shell profile.
crux completion bash # Print a bash completion script
crux completion zsh # Print a zsh completion script
crux completion fish # Print a fish completion script
crux completion powershell # Print a PowerShell completion script
# Example: load zsh completions for the current session
source <(crux completion zsh)Run crux completion <shell> --help for shell-specific installation steps.
Configuration
The CLI discovers the devtools server by connecting to localhost on the configured port (default 4400). Quality auto-attach accepts CRUX_DEVTOOLS_URL only when it points at a loopback devtools origin such as localhost, 127.0.0.1, or [::1]; remote observability and cloud upload targets must be configured explicitly by project code. The Quality runner resolves a project root from --cwd, --config, the nearest config file, the nearest package root, or the current working directory.
Set the CRUX_STARTUP_DEBUG=1 environment variable to show startup timing diagnostics without the --startup-debug flag.