Week 05 — Deep Q-Networks and Variants
When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL.
Week 05 — Deep Q-Networks and Variants
When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL.
Lecture
Function approximation in RL · the deadly triad (bootstrapping + off-policy + function approximation) · DQN (Mnih et al. 2015) · experience replay · target networks · double DQN, dueling DQN, rainbow.
Read before the lecture
Problem set
PS3 — DQN diagnostics
- Show why naive Q-learning with a neural function approximator can diverge. Construct a 2-state counterexample.
- Explain why the target network in DQN breaks the divergence cycle.
Reference text for this week: chapter 05 of the bilingual notes — EN PDF · FR PDF.