ML · schedule · Week 07 of 12 · ← 06 · 08 →

Week 07 — Ensemble Methods

The BellKor 2009 Netflix Prize win: 100+ predictors combined linearly. Most Kaggle competitions since.

Lecture

Bagging and Breiman’s random forest (2001) · boosting (AdaBoost, gradient boosting) · stacking · the bias-variance decomposition of ensemble error · XGBoost (Chen 2016) and the GBM tooling stack.

Read before the lecture

Chen and Guestrin, *XGBoost: A Scalable Tree Boosting System* (KDD 2016)

Code lab

Lab 3 — Gradient boosting in production

Train XGBoost, LightGBM, and CatBoost on the same dataset. Tune hyperparameters. Compare training time, inference time, and accuracy. Audit feature importance with SHAP.

Notebook: lab03-boosting.ipynb · Dataset: Kaggle bank-loan default (Cameroon subset).

Reference text for this week: chapter 07 of the bilingual notes — EN PDF · FR PDF.