RL · schedule · Week 01 of 12 · 02 →

Week 01 — Markov Decision Processes

The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality.

Lecture

MDPs as 5-tuples $(\mathcal{S},\mathcal{A},P,r,\gamma)$ · policies and value functions · the Bellman equations · the principle of optimality · finite-horizon vs infinite-horizon vs average-reward formulations.

Read before the lecture

Sutton and Barto, chapters 3–4

Problem set

PS1 — MDP fundamentals

Solve the optimal value function and policy of a 4×4 gridworld with stochastic transitions analytically.
Prove the Bellman optimality equation from the principle of optimality.

Reference text for this week: chapter 01 of the bilingual notes — EN PDF · FR PDF.