Week 02 — Dynamic Programming

When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates.

RL  ·  schedule  ·  Week 02 of 12 ·  ← 01 ·  03 →

Week 02 — Dynamic Programming

When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates.

Lecture

Policy evaluation · policy improvement · policy iteration · value iteration · the contraction-mapping convergence proof · asynchronous DP · generalized policy iteration.

Read before the lecture

  • Sutton and Barto, chapter 4

Code lab

Value iteration on a real planning problem

Solve a 50-state inventory-management MDP by value iteration. Compare with a hand-designed heuristic policy.

Notebook: lab01-value-iteration.ipynb  ·  Dataset: Synthetic — defined in the notebook.


Reference text for this week: chapter 02 of the bilingual notes — EN PDF · FR PDF.