Week 10 — Constrained and Safe RL

When the agent must respect constraints (safety, fairness, resource limits) even during exploration.

RL  ·  schedule  ·  Week 10 of 12 ·  ← 09 ·  11 →

Week 10 — Constrained and Safe RL

When the agent must respect constraints (safety, fairness, resource limits) even during exploration.

Lecture

Constrained MDPs · Lagrangian methods · CPO (Achiam et al. 2017) · safety filters and shielding · the safety-exploration tension · the open question of safety in RLHF.

Read before the lecture

Recitation — paper discussion

Bai et al., *Constitutional AI: Harmlessness from AI Feedback* (Anthropic 2022) (paper)

Come ready to argue one side of each:

  • Is constitutional AI a constrained-RL approach in disguise?
  • What's the constraint, and how is it enforced?

Reference text for this week: chapter 10 of the bilingual notes — EN PDF · FR PDF.