Week 01 — Markov Decision Processes
The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality.
Week 01 — Markov Decision Processes
The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality.
Lecture
MDPs as 5-tuples $(\mathcal{S},\mathcal{A},P,r,\gamma)$ · policies and value functions · the Bellman equations · the principle of optimality · finite-horizon vs infinite-horizon vs average-reward formulations.
Read before the lecture
Problem set
PS1 — MDP fundamentals
- Solve the optimal value function and policy of a 4×4 gridworld with stochastic transitions analytically.
- Prove the Bellman optimality equation from the principle of optimality.
Reference text for this week: chapter 01 of the bilingual notes — EN PDF · FR PDF.