RL · schedule · Week 05 of 12 · ← 04 · 06 →

Week 05 — Deep Q-Networks and Variants

When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL.

Lecture

Function approximation in RL · the deadly triad (bootstrapping + off-policy + function approximation) · DQN (Mnih et al. 2015) · experience replay · target networks · double DQN, dueling DQN, rainbow.

Read before the lecture

Mnih et al., *Human-level control through deep reinforcement learning* (Nature 2015, the DQN paper)

Problem set

PS3 — DQN diagnostics

Show why naive Q-learning with a neural function approximator can diverge. Construct a 2-state counterexample.
Explain why the target network in DQN breaks the divergence cycle.

Reference text for this week: chapter 05 of the bilingual notes — EN PDF · FR PDF.