Week 02 — Introduction to machine learning
Module 2: framing ML problems, the train/val/test pipeline, leakage, evaluation metrics, learning curves.
What ML actually is, what it isn't, and the workflow that runs underneath every project.
What you ship this week
Notebook + 600-word writeup: a clean train/validation/test pipeline on a real hospital dataset, with calibration analysis and an explicit leakage audit.
| Due | Friday 18:00 (Africa/Lagos (UTC+1)) |
|---|---|
| Submission | Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week. |
| Rubric | Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code. |
Live sessions and labs
Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.
| Day | Time | Block | Recording |
|---|---|---|---|
| Mon | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Mon | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Tue | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Tue | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Wed | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Wed | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Thu | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Thu | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Fri | 10:00-11:00 | Industry speaker | (post-session) |
| Fri | 11:30-12:30 | Lab review | (post-session) |
| Fri | 14:00-15:00 | Cohort retrospective | (post-session) |
Learning outcomes
By the end of the week, every participant will:
- Frame a problem as supervised, unsupervised, or reinforcement learning — and recognize when none of these is the right framing.
- Build a clean train/validation/test pipeline that avoids leakage.
- Choose, fit, and evaluate a simple model on a real dataset.
- Read a learning curve and a confusion matrix without confusing yourself.
Topics covered
The supervised learning loop · loss functions and risk · empirical risk minimization · bias-variance · cross-validation, train/val/test splits, leakage · evaluation metrics for classification and regression · the no free lunch perspective on model choice.
Labs
Lab 1 — Patient readmission prediction
Build a logistic-regression pipeline predicting 30-day readmission. Evaluate not just by AUC but by Brier calibration and reliability diagram.
Dataset: MIMIC-IV demo (a clean 1{,}000-patient excerpt redistributed with the bootcamp).
Lab 2 — Diagnose the broken evaluation
A notebook is supplied with three deliberate leakage points and two metric-misuse traps. Find them, fix them, and write a 200-word memo explaining each.
Dataset: Synthetic, supplied.
Readings
Mandatory
- Before Tuesday. Hastie, Tibshirani, Friedman, *The Elements of Statistical Learning*, chapter 7 (sections 7.1-7.4)
- Before Wednesday. Andrew Ng, *Machine Learning Yearning*, chapters on dev/test splits and bias-variance
Optional deepening
- Sebastian Raschka, *Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning* (arXiv:1811.12808) — Comprehensive review of CV variants and significance tests.