Reinforcement Learning — Cohort site

MDPs, dynamic programming, Q-learning, policy gradients, actor-critic, modern algorithms (PPO, SAC), and the new RLHF pipeline behind ChatGPT-class systems.

self-study Self-study reference — no active cohort

MDPs, dynamic programming, Q-learning, policy gradients, actor-critic, modern algorithms (PPO, SAC), and the new RLHF pipeline behind ChatGPT-class systems.

→ Weekly schedule EN notes (PDF) FR notes (PDF)
Level Graduate
Instructor Dr. Yaé Ulrich Gaba
Meeting pattern Mondays + Wednesdays, 14:00–16:00 lecture · Fridays 14:00–15:00 paper discussion (Africa/Lagos UTC+1)

Prerequisites

Probability and statistics at the level of expectation, variance, and conditional probability. Linear algebra. Comfortable Python and PyTorch. No prior RL required.

Grading

Five problem sets (35%) · three code labs (25%) · paper-discussion recitations (10%) · final project (30%).

Reading

Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., MIT Press 2018) — required reference.

What this site is and isn’t

The bilingual notes (linked above) are the reference text. This cohort site is the operational layer: every week page has the lecture topic, the readings to do beforehand, the problem set or code lab, and any paper discussion. The schedule and weeks are generated from a single data file (_data/apprentissage-renforcement.yml), so the same source drives the landing, the schedule, and every week page. If you are reading along without being in a cohort, the week pages still work as a self-study guide; the deliverables become optional, but the readings and lecture topics are the same.