RL · schedule · Week 02 of 12 · ← 01 · 03 →

Week 02 — Dynamic Programming

When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates.

Lecture

Policy evaluation · policy improvement · policy iteration · value iteration · the contraction-mapping convergence proof · asynchronous DP · generalized policy iteration.

Read before the lecture

Sutton and Barto, chapter 4

Code lab

Value iteration on a real planning problem

Solve a 50-state inventory-management MDP by value iteration. Compare with a hand-designed heuristic policy.

Notebook: lab01-value-iteration.ipynb · Dataset: Synthetic — defined in the notebook.

Reference text for this week: chapter 02 of the bilingual notes — EN PDF · FR PDF.