Week 11 — Reproducibility in Research — Standards and Best Practices
Pineau and Henderson showed in 2017 that identical RL code with different random seeds produces 3× different learning curves. Reproducibility is engineering, not virtue.
Week 11 — Reproducibility in Research — Standards and Best Practices
Pineau and Henderson showed in 2017 that identical RL code with different random seeds produces 3× different learning curves. Reproducibility is engineering, not virtue.
Lecture
The reproducibility crisis (Nature 2016) · ML-specific reproducibility (Pineau and Henderson 2017) · the NeurIPS Reproducibility Checklist · Papers with Code · containerization, seeds, data versioning, dependency pinning · the production-ready research workflow.
Read before the lecture
Recitation — paper discussion
Nature editorial, *1,500 scientists lift the lid on reproducibility* (Nature 2016) (paper)
Come ready to argue one side of each:
- Has ML reproducibility improved between 2016 and 2026?
- What would a useful reproducibility benchmark look like?
Reference text for this week: chapter 11 of the bilingual notes — EN PDF · FR PDF.