Week 01 — Markov Decision Processes

The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality.

RL  ·  schedule  ·  Week 01 of 12 ·  02 →

Week 01 — Markov Decision Processes

The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality.

Lecture

MDPs as 5-tuples $(\mathcal{S},\mathcal{A},P,r,\gamma)$ · policies and value functions · the Bellman equations · the principle of optimality · finite-horizon vs infinite-horizon vs average-reward formulations.

Read before the lecture

Problem set

PS1 — MDP fundamentals

  1. Solve the optimal value function and policy of a 4×4 gridworld with stochastic transitions analytically.
  2. Prove the Bellman optimality equation from the principle of optimality.

Reference text for this week: chapter 01 of the bilingual notes — EN PDF · FR PDF.