Reinforcement Learning — Schedule
Week-by-week schedule of Reinforcement Learning.
The operational schedule for Reinforcement Learning. Per-cohort dates fill in at intake; the structure below is stable across cohorts.
The single source of truth is _data/apprentissage-renforcement.yml. Edits there flow through this page automatically.
| Week | Title | Pitch | Detail |
|---|---|---|---|
| 01 | Markov Decision Processes | The mathematical scaffold under everything in RL: states, actions, transitions, rewards, and the principle of optimality. | week 01 → |
| 02 | Dynamic Programming | When you have the model, you don't need to learn — you compute. The reference algorithms every learner approximates. | week 02 → |
| 03 | Monte Carlo and Temporal Difference Methods | When the model isn't known: learn from sampled trajectories. The MC / TD spectrum is the conceptual backbone of model-free RL. | week 03 → |
| 04 | Q-Learning and Temporal Difference Methods | The off-policy algorithm that unlocks RL: learn the optimal action-value function from any behavior policy. | week 04 → |
| 05 | Deep Q-Networks and Variants | When the state space is too large for a table: DQN, Atari, and the deep-learning era of RL. | week 05 → |
| 06 | Policy Gradient Methods | Directly optimize the policy, not the value function. The REINFORCE algorithm and its descendants. | week 06 → |
| 07 | Actor-Critic Methods | Combine policy gradients with a learned value function — the best of both worlds, and the foundation of most modern RL. | week 07 → |
| 08 | State-of-the-Art Algorithms | What labs and industry are using in 2026: PPO, SAC, MuZero, IMPALA, and a tour of recent improvements. | week 08 → |
| 09 | Multi-Agent Reinforcement Learning | When more than one agent learns at once: cooperation, competition, and the non-stationarity of every other agent. | week 09 → |
| 10 | Constrained and Safe RL | When the agent must respect constraints (safety, fairness, resource limits) even during exploration. | week 10 → |
| 11 | Applications | Three case studies that drove RL into the mainstream: games (AlphaGo line), robotics (Sim2Real), and RLHF for LLMs. | week 11 → |
| 12 | Final project presentations | Each participant trains an RL agent on an environment of their choice and presents results. | week 12 → |
Operational notes
- Default timezone: Africa/Lagos (UTC+1). Per-cohort timing negotiated at intake.
- Lab notebooks and problem-set repos live in the cohort GitHub organization.
- The bilingual lecture notes remain the reference text.