Week 02 — Dynamic Programming
When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates.
Week 02 — Dynamic Programming
When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates.
Lecture
Policy evaluation · policy improvement · policy iteration · value iteration · the contraction-mapping convergence proof · asynchronous DP · generalized policy iteration.
Read before the lecture
- Sutton and Barto, chapter 4
Code lab
Value iteration on a real planning problem
Solve a 50-state inventory-management MDP by value iteration. Compare with a hand-designed heuristic policy.
Notebook: lab01-value-iteration.ipynb · Dataset: Synthetic — defined in the notebook.
Reference text for this week: chapter 02 of the bilingual notes — EN PDF · FR PDF.