Guardrails

Composable safety pipeline for I/O validation — validate, redact, transform, and block content before and after LLM generation.

Guardrails validate content before and after LLM generation. No AI SDK offers this natively — it's a Crux-only feature across all adapters.

Quick Start

guardrails.ts

import { guardrail, createSafetyPlugin } from '@crux/core/safety'

// Block prompt injections on input
const injectionGuard = guardrail({
  name: 'injection',
  phase: 'input',
  validate: async (content) => {
    if (/ignore\b.{0,30}\bprevious\b.{0,30}\binstructions/i.test(content))
      return { action: 'block', reason: 'Prompt injection detected' }
    return { action: 'pass' }
  },
})

// Redact PII on output
const piiGuard = guardrail({
  name: 'pii',
  phase: 'output',
  stream: { buffer: 'full' },
  validate: async (content) => {
    const redacted = content.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, '[EMAIL]')
    if (redacted !== content) return { action: 'redact', content: redacted }
    return { action: 'pass' }
  },
})

// Install as plugin — runs on every generate() and stream() call
config({ plugins: [createSafetyPlugin({ guardrails: [injectionGuard, piiGuard] })] })

Three-Layer Safety Model

Guardrails are Layer 1 of Crux's safety architecture:

Layer	Primitive	What it validates	When
I/O Safety	`guardrail()`	Content safety, PII, injection, toxicity	Before/after generate()
Action Safety	`defineToolFilter()` (future)	Tool call approval, rate limits, cost gates	During tool loop
Output Quality	`constraint()`	Semantic invariants (cite sources, language, tone)	After generation, with retry

Defining Guards

`guardrail()`

Creates a frozen guardrail object. Each guard declares a single phase and a validate function.

import { guardrail } from '@crux/core/safety'

const guard = guardrail({
  name: 'my-guard',
  phase: 'output',
  validate: async (content, ctx) => {
    // content: the text to validate
    //   input phase  → last user message
    //   output phase → model response
    // ctx: GuardrailContext with promptId, model, messages, traceId, metadata
    return { action: 'pass' }
  },
})

Guards are frozen objects — define once, compose freely. Guardrails filter content but never re-call the model. For retry-with-feedback on output quality, use constraint().

Actions

Every validate call returns an action:

Action	Phase	What happens
`pass`	input, output, chunk	Content is safe. Continue to next guard
`block`	input, output, chunk	Hard stop. Throws `GuardrailBlockedError`. Remaining guards skip
`redact`	input, output, chunk	Destructive safety removal. Returns modified `content` + optional `entities`. Modified content flows to next guard
`transform`	input, output, chunk	Constructive quality improvement. Returns modified `content`. Modified content flows to next guard
`warn`	input, output, chunk	Log but continue. Returns `reason`. Content unchanged
`hold`	chunk only	Don't emit yet. Merge this chunk into the next `onChunk` call. Enables cross-chunk transforms

redact vs transform: Both modify content and flow forward the same way. The distinction is semantic — redactions carry safety audit metadata (what was removed, original content), transforms are quality improvements. Devtools displays them differently.

GuardrailContext

Every validate and onChunk call receives context:

interface GuardrailContext {
  phase: 'input' | 'output'
  promptId: string | undefined
  model: string | undefined
  messages: readonly Message[]
  systemPrompt: string | undefined
  traceId: string | undefined
  metadata: Record<string, unknown>
}

Use ctx.model for model-specific rules (stricter for cheaper models), ctx.messages for conversation-aware checks, ctx.metadata for app-specific passthrough.

Execution: the Safety session

You never execute guards by hand. Every adapter constructs a per-call Safety session (createSafety()) that merges all scopes, runs input guards before the first provider call — writing redacted/transformed content back into the messages the model sees — and runs constraints followed by output guards on the final output. Custom adapter dialects drive the same session; see the Safety reference.

Execution model

Input guards (before the first provider call):

Guard the last user message (with prompt-text fallback)
Execute in declaration order
Redacted/transformed content flows forward to subsequent guards — and to the provider
First block short-circuits — remaining guards skip, GuardrailBlockedError propagates

Output guards (after constraints accept the final output):

Same sequential execution and content chaining
Skipped when the run suspends for tool approval — the response is a request for permission, not a final output

Content chaining

When a guard redacts or transforms, the modified content is what the next guard sees:

Guard A (PII redactor): "Call John at 555-1234" → "Call [NAME] at [PHONE]"
Guard B (inspector):    sees "Call [NAME] at [PHONE]" — no false positive on already-redacted content

Retry with feedback?

Guardrails never re-call the model. If you need to validate output quality and retry with feedback until requirements are met, use constraint() instead. The session runs constraints with parallel-check and combined-retry before output guards.

Audit trail

Every applied guard lands in the audit on result._meta.guardrails:

const result = await adapter.generate(prompt, opts)

result._meta.guardrails.blocked              // false (would have thrown if true)
result._meta.guardrails.applied[0].guard     // 'pii'
result._meta.guardrails.applied[0].category  // 'pii' (optional risk-type label)
result._meta.guardrails.applied[0].action    // 'redact'
result._meta.guardrails.applied[0].original  // original content before this guard
result._meta.guardrails.applied[0].durationMs // 2.3

Scoping

Guardrails support four scoping levels, merged via union. When names collide, per-call wins over per-prompt wins over context-level wins over global.

Global (all generate calls)

import { createSafetyPlugin } from '@crux/core/safety'

config({
  plugins: [createSafetyPlugin({ guardrails: [injectionGuard, piiGuard] })],
})

Per-prompt

const blogPrompt = prompt({
  system: 'You are a blog writer...',
  guardrails: [piiGuard, contentSafetyGuard],
  // ...
})

Per-context

const medicalContext = context({
  id: 'medical',
  system: 'Patient data context...',
  guardrails: [hipaaGuard],
})

// Any prompt that uses this context inherits hipaaGuard
const report = prompt({ use: [medicalContext], ... })

Per-call (highest precedence)

await adapter.generate(prompt, {
  guardrails: [strictModeGuard],
})

Audit trail

Guardrail audit attaches to result._meta.guardrails:

const result = await adapter.generate(prompt, opts)
result.text                       // safe (redacted/transformed) content
result._meta.guardrails           // { applied: [...], blocked: false }

`GuardrailBlockedError`

Thrown when a guard blocks content:

import { GuardrailBlockedError } from '@crux/core/safety'

try {
  await adapter.generate(prompt, opts)
} catch (e) {
  if (e instanceof GuardrailBlockedError) {
    e.guardrailId  // 'injection-detection'
    e.phase        // 'input'
    e.reason       // 'Prompt injection detected (score: 0.85)'
  }
}

Streaming

Buffer strategies

Guards declare how they interact with streaming:

Strategy	Use case	Behavior
`buffer: 'none'`	Real-time transforms (URL expansion, import fixes)	`onChunk` runs per-chunk. Client sees results immediately
`buffer: 'full'`	Post-stream validation (PII, safety, formatting)	Chunks accumulate. `validate` runs once after stream completes

Streaming guardrails run automatically in every adapter's stream() — register the guard and stream as usual:

const handle = await adapter.stream(prompt, { model, input, guardrails: [iconFixer, piiGuard] })
// Consumers see only the guarded stream; originals land in the audit on completion meta.

Mixed strategies work together. Each guard runs independently — buffer: 'none' guards process chunks in real-time while buffer: 'full' guards wait for completion. Constraints run report-only at end-of-stream (a live stream cannot regenerate).

Works with any text stream

For custom dialects or non-Crux streams, the session exposes the same protocol directly — safety.openStream().transform() returns a standard Web Streams API TransformStream<string, string>:

import { createSafety } from '@crux/core/safety'

const safety = createSafety({ call: { guardrails: [iconFixer, piiGuard] }, promptId, model })
const guarded = anyTextStream.pipeThrough(safety.openStream().transform())

The guards themselves are also standalone — guardrail() has no adapter dependencies. You can use @crux/core/safety purely for the safety primitives without the rest of Crux.

`onChunk` handler

For buffer: 'none' guards:

const urlExpander = guardrail({
  name: 'url-expander',
  phase: 'output',
  stream: { buffer: 'none' },
  onChunk: async (chunk, accumulated, ctx) => {
    // chunk: current text (may include held content from previous hold)
    // accumulated: everything received so far
    const expanded = chunk.replaceAll('__URL_1__', 'https://example.com/long-url')
    if (expanded !== chunk) return { action: 'transform', content: expanded }
    return { action: 'pass' }
  },
  validate: async () => ({ action: 'pass' }),
})

The `hold` pattern

When a transform needs to see a multi-token pattern (like an import statement that arrives across two chunks), return { action: 'hold' }:

LLM sends: "import { BadIcon }"
  → onChunk("import { BadIcon }")
  → guard: incomplete import — no 'from' clause yet
  → return { action: 'hold' }
  → pipeline: buffers, emits nothing to client

LLM sends: " from 'lucide-react'\n"
  → onChunk("import { BadIcon } from 'lucide-react'\n")  ← held + new, merged
  → guard: complete import! transform it
  → return { action: 'transform', content: "import { GoodIcon } from 'lucide-react'\n" }
  → pipeline: emits transformed content to client

How it works:

hold tells the pipeline "I need more data before I can decide"
The held content is stored in a per-guard buffer and prepended to the chunk on the next call
The guard sees progressively bigger chunks until it can act
Multiple consecutive holds just grow the buffer
On stream end, any held content flushes unchanged (graceful degradation)
Each guard's hold buffer is independent — one guard holding doesn't block others

The guard stays stateless. It doesn't track what was held. It just looks at chunk (which may be bigger than a single LLM token) and decides:

const iconFixer = guardrail({
  name: 'icon-fixer',
  phase: 'output',
  stream: { buffer: 'none' },
  onChunk: async (chunk) => {
    // Incomplete import? Hold — don't emit yet.
    if (/import\s*\{/.test(chunk) && !/'[^']*'/.test(chunk)) {
      return { action: 'hold' }
    }

    // Complete import for lucide-react? Fix invalid icons.
    const match = chunk.match(/import\s*\{([^}]+)\}\s*from\s*['"]lucide-react['"]/)
    if (match) {
      const fixed = fixSpecifiers(match[1]!)
      return { action: 'transform', content: chunk.replace(match[0], fixed) }
    }

    // Not an import — pass through.
    return { action: 'pass' }
  },
  validate: async () => ({ action: 'pass' }),
})

Error recovery

Guard failures must never break the stream. Catch exceptions and pass through unchanged:

onChunk: async (chunk) => {
  try {
    return { action: 'transform', content: riskyTransform(chunk) }
  } catch {
    return { action: 'pass' } // guard failed, pass original
  }
}

Testing

`evaluateGuardrail()`

Test guards against a matrix of cases:

import { evaluateGuardrail } from '@crux/core/safety'

const report = await evaluateGuardrail(piiGuard, [
  { input: 'Email me at john@example.com', expect: 'redact' },
  { input: 'SSN: 123-45-6789', expect: 'redact' },
  { input: 'Hello world', expect: 'pass' },
])

report.summary        // { total: 3, passed: 3, failed: 0 }
report.results[0]     // { input: '...', passed: true, action: 'redact', expected: 'redact', durationMs: 1.2 }

Errors in the guard are caught gracefully — the case reports action: 'error' with the message.

`isGuardrail()`

Runtime type guard:

import { isGuardrail } from '@crux/core/safety'

isGuardrail(piiGuard)              // true
isGuardrail({ _tag: 'Prompt' })    // false

Devtools & Observability

Guardrails emit two event types to the devtools protocol:

Event	When	Key fields
`guardrail:run`	Every guard execution	`guardrailId`, `phase`, `action`, `reason`, `durationMs`

These events:

Appear in the devtools timeline alongside traces, tool calls, and memory events
Are emitted as OTel spans when @crux/otel is active
Are wired through InstrumentationHooks (onGuardrailRun)

Use InstrumentationHooks.onGuardrailRun plus the audit on result._meta.guardrails for application-level logging and EU AI Act audit trails. The optional category field on each policy lets reporting aggregate by risk type (pii, jailbreak, toxicity, ...) instead of by policy name.

Recipes

Production-grade guardrail implementations usually combine the patterns below with the broader Safety guide:

PII Detection & Redaction — Presidio-inspired recognizer pattern with context-enhanced confidence, Luhn validation, 3 redaction strategies
Prompt Injection Detection — 20+ regex patterns, multi-signal scoring, Unicode normalization, 5-layer defense model
Content Safety — Three-tier escalation (word-list, API classifier, LLM-as-judge), EU AI Act compliance
Streaming Transforms — v0-style LLM Suspense + Autofixer patterns with hold for cross-chunk fixes

On this page