Week 08 — Unsupervised Learning — Clustering

Lloyd 1957 at Bell Labs: partition signals into groups, pick a representative, iterate. The most-taught clustering algorithm in ML.

ML  ·  schedule  ·  Week 08 of 12 ·  ← 07 ·  09 →

Week 08 — Unsupervised Learning — Clustering

Lloyd 1957 at Bell Labs: partition signals into groups, pick a representative, iterate. The most-taught clustering algorithm in ML.

Lecture

$k$-means and Lloyd’s algorithm · hierarchical clustering (linkage methods, dendrograms) · DBSCAN · Gaussian mixture models via EM · choosing $k$ (silhouette, gap statistic, BIC).

Read before the lecture

  • Hastie, Tibshirani, Friedman, chapter 14

Problem set

PS4 — Clustering theory

  1. Prove that $k$-means converges in finite steps.
  2. Show that $k$-means is sensitive to initialization. Construct an example where two initializations give substantially different clusterings.
  3. Implement DBSCAN from scratch. Compare with sklearn on a public dataset.

Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.