Reinforcement Learning — Cohort site
MDPs, dynamic programming, Q-learning, policy gradients, actor-critic, modern algorithms (PPO, SAC), and the new RLHF pipeline behind ChatGPT-class systems.
MDPs, dynamic programming, Q-learning, policy gradients, actor-critic, modern algorithms (PPO, SAC), and the new RLHF pipeline behind ChatGPT-class systems.
| Level | Graduate |
|---|---|
| Instructor | Dr. Yaé Ulrich Gaba |
| Meeting pattern | Mondays + Wednesdays, 14:00–16:00 lecture · Fridays 14:00–15:00 paper discussion (Africa/Lagos UTC+1) |
Prerequisites
Probability and statistics at the level of expectation, variance, and conditional probability. Linear algebra. Comfortable Python and PyTorch. No prior RL required.
Grading
Five problem sets (35%) · three code labs (25%) · paper-discussion recitations (10%) · final project (30%).
Reading
Sutton and Barto, Reinforcement Learning: An Introduction (2nd ed., MIT Press 2018) — required reference.
What this site is and isn’t
The bilingual notes (linked above) are the reference text. This cohort site is the operational layer: every week page has the lecture topic, the readings to do beforehand, the problem set or code lab, and any paper discussion. The schedule and weeks are generated from a single data file (_data/apprentissage-renforcement.yml), so the same source drives the landing, the schedule, and every week page. If you are reading along without being in a cohort, the week pages still work as a self-study guide; the deliverables become optional, but the readings and lecture topics are the same.