Program syllabus

AI Guardrails & Safety Engineering

Design, test, and ship guardrails that keep your AI assistants safe, compliant, and predictable — without grinding product velocity to a halt.

Program overview · Audio

A short overview of the program: who it's for, what we cover, and how to get the most value out of it as a busy professional.

Curriculum overview

This can run as an intensive 2-day workshop or stretched over 3–4 weeks with async work in between.

Module 1

Safety foundations & threat models

• Common failure modes: hallucination, leakage, abuse
• Framing harm: user, business, compliance, reputation
• Defining your “never events” and risk appetite
• Mapping flows where guardrails must sit

Module 2

Policies, taxonomies & rule systems

• Turning vague concerns into concrete policy text
• Designing label taxonomies for classifiers
• Rules vs. model-based filters: when to use each
• Aligning with legal / compliance without freezing

Module 3

Building the guardrail pipeline

• Input filters, output filters, and tool constraints
• Chaining rule engines, classifiers, and LLM checks
• Fallback behaviors: refuse, re-ask, hand off
• Logging the right signals for later review

Module 4

Red-teaming, golden sets & regression tests

• Structured red-teaming sessions with your own data
• Building and maintaining safety golden sets
• Automating checks in CI or scheduled jobs
• Reporting results to stakeholders in plain language

Module 5

Monitoring, dashboards & incident playbooks

• Choosing leading indicators beyond “error rate”
• Designing a simple guardrail dashboard
• Alert routing and escalation paths
• Incident review templates & change management

Capstone

Your guardrail design doc

Throughout the program, your team builds a concise guardrail design doc for one of your assistants. It captures threat models, policies, pipeline design, metrics, and incident playbooks. This becomes the artifact you can share with leadership, compliance, and future teammates.

1. Safety foundations & threat models

We define what “safe enough” means for your assistants — in language that product, legal, and engineering can all live with.

We catalog how your assistants can go wrong: from hallucinated instructions, to data leakage, to subtle reputation damage. Then we rank those risks by likelihood and impact so you know where to start.

• Mapping user journeys and guardrail touchpoints
• “Never events” vs. acceptable, mitigated risk
• Distinguishing product risk from compliance risk
• Translating risk appetite into concrete constraints

Artifact: assistant threat model

We co-author a light-weight threat model for one high-impact assistant, including:

• Key user journeys and failure modes
• Severity / likelihood matrix
• Proposed mitigations and open questions

This becomes the reference doc for both guardrail work and future product decisions.

2. Policies, taxonomies & rule systems

We turn fuzzy concerns into concrete labels, rules, and examples your models can actually learn from.

Instead of giant policy PDFs nobody reads, we design lean policies tied directly to decisions: allow, block, escalate, or log. Then we define labels and examples that make those policies machine-readable.

• Drafting policy snippets aimed at model behavior
• Designing label taxonomies for classifiers
• Balancing rule-based and model-based filters
• Working with legal without freezing delivery

Artifact: policy-to-label map

We build a short, structured mapping from policy statements to labels, rules, and example prompts:

• Policy clause → label(s) → action
• Positive and negative examples per label
• Open questions to resolve with stakeholders

3. Building the guardrail pipeline

We design a concrete pipeline that sits alongside your assistants, not bolted on as an afterthought.

• Input filters, output filters, and tool constraints
• Where to place checks in multi-step workflows
• Fallback behaviors: refuse, re-ask, or hand off
• Logging signals for auditing and debugging

Templates you can reuse

• Example guardrail pipeline diagram
• Checklist for adding a new guardrail
• “Safe fallback” pattern catalog

4. Red-teaming, golden sets & regression tests

We treat safety as an ongoing test suite, not a one-time review before launch.

We show you how to turn messy ad-hoc red-teaming sessions into a repeatable process that generates durable test cases and data.

• Designing structured red-teaming sessions
• Converting findings into golden test sets
• Running checks in CI and scheduled jobs
• Reporting results in clear, non-alarmist language

Artifact: safety golden set starter

We assemble an initial golden set for one assistant, including:

• High-risk prompts and expected behaviors
• Edge cases drawn from your real traffic
• Skeleton for adding new cases over time

5. Monitoring, dashboards & incident playbooks

We assume guardrails will eventually be stressed — and design how you’ll see it and respond when they are.

• Choosing leading indicators beyond “error rate”
• Designing a simple guardrail dashboard
• Defining alert thresholds and escalation paths
• Incident review templates and change management

Artifact: incident playbook

We create a reusable incident playbook for one representative failure scenario:

• Detection signals and owners
• Immediate containment steps
• Post-incident review checklist

Ready to harden your assistants with guardrails?

We usually run this as a focused engagement around one or two critical assistants. You bring real scenarios and constraints; we bring patterns, templates, and a shared language for safety.

By the end, you'll have a guardrail design doc, test sets, and monitoring plan your team can execute — without slowing product down.

Talk to us about this program View all teams programs

This pairs especially well with AI Assistant Observability & SLOs and Advanced Retrieval Engineering for a production-grade AI reliability stack.