Week 08 — State-of-the-Art Algorithms
What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements.
Week 08 — State-of-the-Art Algorithms
What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements.
Lecture
PPO and SAC in production · MuZero (Schrittwieser et al. 2020) and learned-model RL · IMPALA (Espeholt et al. 2018) for distributed training · the AlphaZero / MuZero line · offline RL (Levine et al. 2020).
Read before the lecture
Recitation — paper discussion
Levine et al., *Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems* (2020) (paper)
Come ready to argue one side of each:
- When is offline RL the right tool and when isn't it?
- What are the failure modes of behavior-cloning baselines?
Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.