Week 08 — State-of-the-Art Algorithms

What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements.

RL  ·  schedule  ·  Week 08 of 12 ·  ← 07 ·  09 →

Week 08 — State-of-the-Art Algorithms

What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements.

Lecture

PPO and SAC in production · MuZero (Schrittwieser et al. 2020) and learned-model RL · IMPALA (Espeholt et al. 2018) for distributed training · the AlphaZero / MuZero line · offline RL (Levine et al. 2020).

Read before the lecture

Recitation — paper discussion

Levine et al., *Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems* (2020) (paper)

Come ready to argue one side of each:

  • When is offline RL the right tool and when isn't it?
  • What are the failure modes of behavior-cloning baselines?

Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.