Week 10 — Model Monitoring in Production
A deployed model does not self-regulate. It drifts, finds shortcuts, gets exposed to populations the training data never saw.
Week 10 — Model Monitoring in Production
A deployed model does not self-regulate. It drifts, finds shortcuts, gets exposed to populations the training data never saw.
Lecture
Performance metrics in production (latency, throughput, error rate) · data drift detection (Kolmogorov-Smirnov, Wasserstein, Population Stability Index) · concept drift · alerting (Grafana, Prometheus, PagerDuty) · the Amazon hiring-model failure as a case study · fairness monitoring (Fairlearn, Evidently).
Read before the lecture
Recitation — paper discussion
Dastin, *Amazon scraps secret AI recruiting tool that showed bias against women* (Reuters 2018) (paper)
Come ready to argue one side of each:
- What would the monitoring system have looked like that caught this?
- Is monitoring sufficient, or is the problem upstream in training data?
Reference text for this week: chapter 10 of the bilingual notes — EN PDF · FR PDF.