Week 05 — Deep Q-Networks and Variants

When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL.

RL  ·  schedule  ·  Week 05 of 12 ·  ← 04 ·  06 →

Week 05 — Deep Q-Networks and Variants

When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL.

Lecture

Function approximation in RL · the deadly triad (bootstrapping + off-policy + function approximation) · DQN (Mnih et al. 2015) · experience replay · target networks · double DQN, dueling DQN, rainbow.

Read before the lecture

Problem set

PS3 — DQN diagnostics

  1. Show why naive Q-learning with a neural function approximator can diverge. Construct a 2-state counterexample.
  2. Explain why the target network in DQN breaks the divergence cycle.

Reference text for this week: chapter 05 of the bilingual notes — EN PDF · FR PDF.