Week 10 — Constrained and Safe RL
When the agent must respect constraints (safety, fairness, resource limits) even during exploration.
Week 10 — Constrained and Safe RL
When the agent must respect constraints (safety, fairness, resource limits) even during exploration.
Lecture
Constrained MDPs · Lagrangian methods · CPO (Achiam et al. 2017) · safety filters and shielding · the safety-exploration tension · the open question of safety in RLHF.
Read before the lecture
Recitation — paper discussion
Bai et al., *Constitutional AI: Harmlessness from AI Feedback* (Anthropic 2022) (paper)
Come ready to argue one side of each:
- Is constitutional AI a constrained-RL approach in disguise?
- What's the constraint, and how is it enforced?
Reference text for this week: chapter 10 of the bilingual notes — EN PDF · FR PDF.