Week 11 — Reproducibility in Research — Standards and Best Practices

Pineau and Henderson showed in 2017 that identical RL code with different random seeds produces 3× different learning curves. Reproducibility is engineering, not virtue.

MLOps  ·  schedule  ·  Week 11 of 12 ·  ← 10 ·  12 →

Week 11 — Reproducibility in Research — Standards and Best Practices

Pineau and Henderson showed in 2017 that identical RL code with different random seeds produces 3× different learning curves. Reproducibility is engineering, not virtue.

Lecture

The reproducibility crisis (Nature 2016) · ML-specific reproducibility (Pineau and Henderson 2017) · the NeurIPS Reproducibility Checklist · Papers with Code · containerization, seeds, data versioning, dependency pinning · the production-ready research workflow.

Read before the lecture

Recitation — paper discussion

Nature editorial, *1,500 scientists lift the lid on reproducibility* (Nature 2016) (paper)

Come ready to argue one side of each:

  • Has ML reproducibility improved between 2016 and 2026?
  • What would a useful reproducibility benchmark look like?

Reference text for this week: chapter 11 of the bilingual notes — EN PDF · FR PDF.