Week 08 — Unsupervised Learning — Clustering
Lloyd 1957 at Bell Labs: partition signals into groups, pick a representative, iterate. The most-taught clustering algorithm in ML.
Week 08 — Unsupervised Learning — Clustering
Lloyd 1957 at Bell Labs: partition signals into groups, pick a representative, iterate. The most-taught clustering algorithm in ML.
Lecture
$k$-means and Lloyd’s algorithm · hierarchical clustering (linkage methods, dendrograms) · DBSCAN · Gaussian mixture models via EM · choosing $k$ (silhouette, gap statistic, BIC).
Read before the lecture
- Hastie, Tibshirani, Friedman, chapter 14
Problem set
PS4 — Clustering theory
- Prove that $k$-means converges in finite steps.
- Show that $k$-means is sensitive to initialization. Construct an example where two initializations give substantially different clusterings.
- Implement DBSCAN from scratch. Compare with sklearn on a public dataset.
Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.