Reinforcement Learning — Schedule

Week-by-week schedule of Reinforcement Learning.

← cohort home

The operational schedule for Reinforcement Learning. Per-cohort dates fill in at intake; the structure below is stable across cohorts.

The single source of truth is _data/apprentissage-renforcement.yml. Edits there flow through this page automatically.


Week Title Pitch Detail
01 Markov Decision Processes The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality. week 01 →
02 Dynamic Programming When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates. week 02 →
03 Monte Carlo and Temporal Difference Methods When the model isn't known: learn from sampled trajectories. The MC / TD spectrum is the conceptual backbone of model-free RL. week 03 →
04 Q-Learning and Temporal Difference Methods The off-policy algorithm that unlocks RL: learn the optimal action-value function from any behavior policy. week 04 →
05 Deep Q-Networks and Variants When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL. week 05 →
06 Policy Gradient Methods Directly optimize the policy, not the value function. The REINFORCE algorithm and its descendants. week 06 →
07 Actor-Critic Methods Combine policy gradients with a learned value function — the best of both worlds, and the foundation of most modern RL. week 07 →
08 State-of-the-Art Algorithms What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements. week 08 →
09 Multi-Agent Reinforcement Learning When more than one agent learns at once: cooperation, competition, and the non-stationarity of every other agent. week 09 →
10 Constrained and Safe RL When the agent must respect constraints (safety, fairness, resource limits) even during exploration. week 10 →
11 Applications Three case studies that drove RL into the mainstream: games (AlphaGo line), robotics (Sim2Real), and RLHF for LLMs. week 11 →
12 Final project presentations Each participant trains an RL agent on an environment of their choice and presents results. week 12 →

Operational notes

  • Default timezone: Africa/Lagos (UTC+1). Per-cohort timing negotiated at intake.
  • Lab notebooks and problem-set repos live in the cohort GitHub organization.
  • The bilingual lecture notes remain the reference text.