Guardrails
Composable safety pipeline for I/O validation — validate, redact, transform, and block content before and after LLM generation.
Guardrails validate content before and after LLM generation. No AI SDK offers this natively — it's a Crux-only feature across all adapters.
Quick Start
import { guardrail, createSafetyPlugin } from '@crux/core/safety'
// Block prompt injections on input
const injectionGuard = guardrail({
name: 'injection',
phase: 'input',
validate: async (content) => {
if (/ignore\b.{0,30}\bprevious\b.{0,30}\binstructions/i.test(content))
return { action: 'block', reason: 'Prompt injection detected' }
return { action: 'pass' }
},
})
// Redact PII on output
const piiGuard = guardrail({
name: 'pii',
phase: 'output',
stream: { buffer: 'full' },
validate: async (content) => {
const redacted = content.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, '[EMAIL]')
if (redacted !== content) return { action: 'redact', content: redacted }
return { action: 'pass' }
},
})
// Install as plugin — runs on every generate() and stream() call
config({ plugins: [createSafetyPlugin({ guardrails: [injectionGuard, piiGuard] })] })Three-Layer Safety Model
Guardrails are Layer 1 of Crux's safety architecture:
| Layer | Primitive | What it validates | When |
|---|---|---|---|
| I/O Safety | guardrail() | Content safety, PII, injection, toxicity | Before/after generate() |
| Action Safety | defineToolFilter() (future) | Tool call approval, rate limits, cost gates | During tool loop |
| Output Quality | constraint() | Semantic invariants (cite sources, language, tone) | After generation, with retry |
Defining Guards
guardrail()
Creates a frozen guardrail object. Each guard declares a single phase and a validate function.
import { guardrail } from '@crux/core/safety'
const guard = guardrail({
name: 'my-guard',
phase: 'output',
validate: async (content, ctx) => {
// content: the text to validate
// input phase → last user message
// output phase → model response
// ctx: GuardrailContext with promptId, model, messages, traceId, metadata
return { action: 'pass' }
},
})Guards are frozen objects — define once, compose freely. Guardrails filter content but never re-call the model. For retry-with-feedback on output quality, use constraint().
Actions
Every validate call returns an action:
| Action | Phase | What happens |
|---|---|---|
pass | input, output, chunk | Content is safe. Continue to next guard |
block | input, output, chunk | Hard stop. Throws GuardrailBlockedError. Remaining guards skip |
redact | input, output, chunk | Destructive safety removal. Returns modified content + optional entities. Modified content flows to next guard |
transform | input, output, chunk | Constructive quality improvement. Returns modified content. Modified content flows to next guard |
warn | input, output, chunk | Log but continue. Returns reason. Content unchanged |
hold | chunk only | Don't emit yet. Merge this chunk into the next onChunk call. Enables cross-chunk transforms |
redact vs transform: Both modify content and flow forward the same way. The distinction is semantic — redactions carry safety audit metadata (what was removed, original content), transforms are quality improvements. Devtools displays them differently.
GuardrailContext
Every validate and onChunk call receives context:
interface GuardrailContext {
phase: 'input' | 'output'
promptId: string | undefined
model: string | undefined
messages: readonly Message[]
systemPrompt: string | undefined
traceId: string | undefined
metadata: Record<string, unknown>
}Use ctx.model for model-specific rules (stricter for cheaper models), ctx.messages for conversation-aware checks, ctx.metadata for app-specific passthrough.
Execution: the Safety session
You never execute guards by hand. Every adapter constructs a per-call Safety session (createSafety()) that merges all scopes, runs input guards before the first provider call — writing redacted/transformed content back into the messages the model sees — and runs constraints followed by output guards on the final output. Custom adapter dialects drive the same session; see the Safety reference.
Execution model
Input guards (before the first provider call):
- Guard the last user message (with prompt-text fallback)
- Execute in declaration order
- Redacted/transformed content flows forward to subsequent guards — and to the provider
- First
blockshort-circuits — remaining guards skip,GuardrailBlockedErrorpropagates
Output guards (after constraints accept the final output):
- Same sequential execution and content chaining
- Skipped when the run suspends for tool approval — the response is a request for permission, not a final output
Content chaining
When a guard redacts or transforms, the modified content is what the next guard sees:
Guard A (PII redactor): "Call John at 555-1234" → "Call [NAME] at [PHONE]"
Guard B (inspector): sees "Call [NAME] at [PHONE]" — no false positive on already-redacted contentRetry with feedback?
Guardrails never re-call the model. If you need to validate output quality and retry with feedback until requirements are met, use constraint() instead. The session runs constraints with parallel-check and combined-retry before output guards.
Audit trail
Every applied guard lands in the audit on result._meta.guardrails:
const result = await adapter.generate(prompt, opts)
result._meta.guardrails.blocked // false (would have thrown if true)
result._meta.guardrails.applied[0].guard // 'pii'
result._meta.guardrails.applied[0].category // 'pii' (optional risk-type label)
result._meta.guardrails.applied[0].action // 'redact'
result._meta.guardrails.applied[0].original // original content before this guard
result._meta.guardrails.applied[0].durationMs // 2.3Scoping
Guardrails support four scoping levels, merged via union. When names collide, per-call wins over per-prompt wins over context-level wins over global.
Global (all generate calls)
import { createSafetyPlugin } from '@crux/core/safety'
config({
plugins: [createSafetyPlugin({ guardrails: [injectionGuard, piiGuard] })],
})Per-prompt
const blogPrompt = prompt({
system: 'You are a blog writer...',
guardrails: [piiGuard, contentSafetyGuard],
// ...
})Per-context
const medicalContext = context({
id: 'medical',
system: 'Patient data context...',
guardrails: [hipaaGuard],
})
// Any prompt that uses this context inherits hipaaGuard
const report = prompt({ use: [medicalContext], ... })Per-call (highest precedence)
await adapter.generate(prompt, {
guardrails: [strictModeGuard],
})Audit trail
Guardrail audit attaches to result._meta.guardrails:
const result = await adapter.generate(prompt, opts)
result.text // safe (redacted/transformed) content
result._meta.guardrails // { applied: [...], blocked: false }GuardrailBlockedError
Thrown when a guard blocks content:
import { GuardrailBlockedError } from '@crux/core/safety'
try {
await adapter.generate(prompt, opts)
} catch (e) {
if (e instanceof GuardrailBlockedError) {
e.guardrailId // 'injection-detection'
e.phase // 'input'
e.reason // 'Prompt injection detected (score: 0.85)'
}
}Streaming
Buffer strategies
Guards declare how they interact with streaming:
| Strategy | Use case | Behavior |
|---|---|---|
buffer: 'none' | Real-time transforms (URL expansion, import fixes) | onChunk runs per-chunk. Client sees results immediately |
buffer: 'full' | Post-stream validation (PII, safety, formatting) | Chunks accumulate. validate runs once after stream completes |
Streaming guardrails run automatically in every adapter's stream() — register the guard and stream as usual:
const handle = await adapter.stream(prompt, { model, input, guardrails: [iconFixer, piiGuard] })
// Consumers see only the guarded stream; originals land in the audit on completion meta.Mixed strategies work together. Each guard runs independently — buffer: 'none' guards process chunks in real-time while buffer: 'full' guards wait for completion. Constraints run report-only at end-of-stream (a live stream cannot regenerate).
Works with any text stream
For custom dialects or non-Crux streams, the session exposes the same protocol directly — safety.openStream().transform() returns a standard Web Streams API TransformStream<string, string>:
import { createSafety } from '@crux/core/safety'
const safety = createSafety({ call: { guardrails: [iconFixer, piiGuard] }, promptId, model })
const guarded = anyTextStream.pipeThrough(safety.openStream().transform())The guards themselves are also standalone — guardrail() has no adapter dependencies. You can use @crux/core/safety purely for the safety primitives without the rest of Crux.
onChunk handler
For buffer: 'none' guards:
const urlExpander = guardrail({
name: 'url-expander',
phase: 'output',
stream: { buffer: 'none' },
onChunk: async (chunk, accumulated, ctx) => {
// chunk: current text (may include held content from previous hold)
// accumulated: everything received so far
const expanded = chunk.replaceAll('__URL_1__', 'https://example.com/long-url')
if (expanded !== chunk) return { action: 'transform', content: expanded }
return { action: 'pass' }
},
validate: async () => ({ action: 'pass' }),
})The hold pattern
When a transform needs to see a multi-token pattern (like an import statement that arrives across two chunks), return { action: 'hold' }:
LLM sends: "import { BadIcon }"
→ onChunk("import { BadIcon }")
→ guard: incomplete import — no 'from' clause yet
→ return { action: 'hold' }
→ pipeline: buffers, emits nothing to client
LLM sends: " from 'lucide-react'\n"
→ onChunk("import { BadIcon } from 'lucide-react'\n") ← held + new, merged
→ guard: complete import! transform it
→ return { action: 'transform', content: "import { GoodIcon } from 'lucide-react'\n" }
→ pipeline: emits transformed content to clientHow it works:
holdtells the pipeline "I need more data before I can decide"- The held content is stored in a per-guard buffer and prepended to the chunk on the next call
- The guard sees progressively bigger chunks until it can act
- Multiple consecutive holds just grow the buffer
- On stream end, any held content flushes unchanged (graceful degradation)
- Each guard's hold buffer is independent — one guard holding doesn't block others
The guard stays stateless. It doesn't track what was held. It just looks at chunk (which may be bigger than a single LLM token) and decides:
const iconFixer = guardrail({
name: 'icon-fixer',
phase: 'output',
stream: { buffer: 'none' },
onChunk: async (chunk) => {
// Incomplete import? Hold — don't emit yet.
if (/import\s*\{/.test(chunk) && !/'[^']*'/.test(chunk)) {
return { action: 'hold' }
}
// Complete import for lucide-react? Fix invalid icons.
const match = chunk.match(/import\s*\{([^}]+)\}\s*from\s*['"]lucide-react['"]/)
if (match) {
const fixed = fixSpecifiers(match[1]!)
return { action: 'transform', content: chunk.replace(match[0], fixed) }
}
// Not an import — pass through.
return { action: 'pass' }
},
validate: async () => ({ action: 'pass' }),
})Error recovery
Guard failures must never break the stream. Catch exceptions and pass through unchanged:
onChunk: async (chunk) => {
try {
return { action: 'transform', content: riskyTransform(chunk) }
} catch {
return { action: 'pass' } // guard failed, pass original
}
}Testing
evaluateGuardrail()
Test guards against a matrix of cases:
import { evaluateGuardrail } from '@crux/core/safety'
const report = await evaluateGuardrail(piiGuard, [
{ input: 'Email me at john@example.com', expect: 'redact' },
{ input: 'SSN: 123-45-6789', expect: 'redact' },
{ input: 'Hello world', expect: 'pass' },
])
report.summary // { total: 3, passed: 3, failed: 0 }
report.results[0] // { input: '...', passed: true, action: 'redact', expected: 'redact', durationMs: 1.2 }Errors in the guard are caught gracefully — the case reports action: 'error' with the message.
isGuardrail()
Runtime type guard:
import { isGuardrail } from '@crux/core/safety'
isGuardrail(piiGuard) // true
isGuardrail({ _tag: 'Prompt' }) // falseDevtools & Observability
Guardrails emit two event types to the devtools protocol:
| Event | When | Key fields |
|---|---|---|
guardrail:run | Every guard execution | guardrailId, phase, action, reason, durationMs |
These events:
- Appear in the devtools timeline alongside traces, tool calls, and memory events
- Are emitted as OTel spans when
@crux/otelis active - Are wired through
InstrumentationHooks(onGuardrailRun)
Use InstrumentationHooks.onGuardrailRun plus the audit on result._meta.guardrails for application-level logging and EU AI Act audit trails. The optional category field on each policy lets reporting aggregate by risk type (pii, jailbreak, toxicity, ...) instead of by policy name.
Recipes
Production-grade guardrail implementations usually combine the patterns below with the broader Safety guide:
- PII Detection & Redaction — Presidio-inspired recognizer pattern with context-enhanced confidence, Luhn validation, 3 redaction strategies
- Prompt Injection Detection — 20+ regex patterns, multi-signal scoring, Unicode normalization, 5-layer defense model
- Content Safety — Three-tier escalation (word-list, API classifier, LLM-as-judge), EU AI Act compliance
- Streaming Transforms — v0-style LLM Suspense + Autofixer patterns with
holdfor cross-chunk fixes