Crux
API Reference@crux/core

Scoring

LLM-as-a-judge scoring and pre-built quality metrics.

import { llmJudge, metrics, judgeConstraint } from '@crux/core/scoring'

llmJudge(config)

Create a reusable judge instance.

FieldTypeDescription
idstringJudge identifier
criteriastringWhat to evaluate
scale{ min: number, max: number }Score range
rubricRecord<number, string>?Score descriptions
chainOfThoughtboolean?Enable reasoning (default: true)
fewShotJudgeFewShot[]?Calibration examples
generateGenerateObjectFnStructured generation function
modelunknownJudge model

Returns: JudgeInstance

MethodDescription
.score(input)Score an input/output pair
.idJudge identifier

.score() input:

FieldTypeDescription
inputstringThe query/prompt
outputstringThe response to evaluate
referencestring?Optional ground truth

.score() returns JudgeResult:

FieldTypeDescription
scorenumberClamped to scale
reasoningstringChain-of-thought explanation
metricIdstringJudge id

metrics

Six pre-built judges. Each takes { generate: GenerateObjectFn, model } and returns a JudgeInstance.

MetricCriteriaScale
metrics.relevance()Is the output relevant to the input query?1–5
metrics.faithfulness()Is the output faithful to provided context?1–5
metrics.coherence()Is the output logically coherent and well-structured?1–5
metrics.completeness()Does the output address all aspects of the query?1–5
metrics.toxicity()Is the output free from harmful content? (5 = safe)1–5
metrics.conciseness()Is the output appropriately concise?1–5
const relevance = metrics.relevance({ generate: generateObjectFn, model })
const result = await relevance.score({ input: query, output: response })

judgeConstraint(judge, opts)

Bridge a judge into a normal Constraint for online enforcement of scored quality. The returned constraint introduces nothing new — it behaves exactly like a hand-written constraint(): the safety session runs it with retries, audits, and observability unchanged, and the judge's reasoning becomes the corrective feedback for regeneration rounds.

const brandVoice = llmJudge({
  id: 'brand-voice',
  criteria: 'Does the copy match the warm, direct brand voice?',
  scale: { min: 1, max: 10 },
})

const brandVoiceGate = judgeConstraint(brandVoice, { min: 7 })
// → an ordinary Constraint named "brand-voice"
OptionTypeDescription
minnumberMinimum acceptable score on the judge's own scale (inclusive)
severity'assert' | 'suggest'?Constraint severity (default: 'assert')
maxRetriesnumber?Per-constraint retry budget (default: 2)
categorystring?Risk-category label carried into audits
feedback(result: JudgeResult) => string?Retry feedback (default: the judge's reasoning)
generateGenerateObjectFn?Judge generate override for the production call
modelunknown?Judge model override for the production call
input(output, ctx) => string?Derive the judge's input field (default: empty)

Returns: Constraint<TSchema> — pass it anywhere constraints are accepted (per-call, per-prompt, or createSafetyPlugin()). The factory is generic over the parsed-output schema like constraint() itself: annotate the input callback's parameter as ConstraintOutput<typeof mySchema> and output.parsed is typed instead of unknown.

Check metadata.judge on audit entries carries a JudgeConstraintVerdict: { metricId, score, min, reasoning, detail? }detail is present (and typed by the judge's TDetail) when the judge has a detailSchema.

For quality runs, scorers.judge() in @crux/core/quality reuses this judge machinery with rubric and choice-score modes.

Types

import type {
  JudgeConfig, JudgeInstance, JudgeResult, JudgeInput,
  JudgeScoreOptions, JudgeFewShot, JudgeConstraintOptions, JudgeConstraintVerdict,
} from '@crux/core/scoring'

On this page