RL · schedule · Week 08 of 12 · ← 07 · 09 →

Week 08 — State-of-the-Art Algorithms

What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements.

Lecture

PPO and SAC in production · MuZero (Schrittwieser et al. 2020) and learned-model RL · IMPALA (Espeholt et al. 2018) for distributed training · the AlphaZero / MuZero line · offline RL (Levine et al. 2020).

Read before the lecture

Schrittwieser et al., *Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model* (Nature 2020, the MuZero paper)

Recitation — paper discussion

Levine et al., *Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems* (2020) (paper)

Come ready to argue one side of each:

When is offline RL the right tool and when isn't it?
What are the failure modes of behavior-cloning baselines?

Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.