RL · schedule · Week 09 of 12 · ← 08 · 10 →

Week 09 — Multi-Agent Reinforcement Learning

When more than one agent learns at once: cooperation, competition, and the non-stationarity of every other agent.

Lecture

Markov games as the generalization of MDPs · cooperative vs competitive vs general-sum settings · independent learning vs CTDE (centralized training, decentralized execution) · QMIX and MADDPG · multi-agent emergence.

Read before the lecture

Lowe et al., *Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments* (NeurIPS 2017, MADDPG)

Problem set

PS5 — Multi-agent fundamentals

Show that independent Q-learning agents in a simple 2-player Markov game can fail to find a Nash equilibrium.
Implement Iterated Prisoner’s Dilemma with two Q-learning agents. What policies emerge?

Reference text for this week: chapter 09 of the bilingual notes — EN PDF · FR PDF.