RL · schedule · Week 10 of 12 · ← 09 · 11 →

Week 10 — Constrained and Safe RL

When the agent must respect constraints (safety, fairness, resource limits) even during exploration.

Lecture

Constrained MDPs · Lagrangian methods · CPO (Achiam et al. 2017) · safety filters and shielding · the safety-exploration tension · the open question of safety in RLHF.

Read before the lecture

Achiam et al., *Constrained Policy Optimization* (ICML 2017)

Recitation — paper discussion

Bai et al., *Constitutional AI: Harmlessness from AI Feedback* (Anthropic 2022) (paper)

Come ready to argue one side of each:

Is constitutional AI a constrained-RL approach in disguise?
What's the constraint, and how is it enforced?

Reference text for this week: chapter 10 of the bilingual notes — EN PDF · FR PDF.