Week 02 — Introduction to machine learning |

Module M2 | ← schedule | ← week 01 | week 03 →

What ML actually is, what it isn't, and the workflow that runs underneath every project.

What you ship this week

Notebook + 600-word writeup: a clean train/validation/test pipeline on a real hospital dataset, with calibration analysis and an explicit leakage audit.

Due	Friday 18:00 (Africa/Lagos (UTC+1))
Submission	Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week.
Rubric	Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code.

Live sessions and labs

Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.

Day	Time	Block	Recording
Mon	`09:00-12:00`	Live instruction + code-along	(post-session)
Mon	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Tue	`09:00-12:00`	Live instruction + code-along	(post-session)
Tue	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Wed	`09:00-12:00`	Live instruction + code-along	(post-session)
Wed	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Thu	`09:00-12:00`	Live instruction + code-along	(post-session)
Thu	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Fri	`10:00-11:00`	Industry speaker	(post-session)
Fri	`11:30-12:30`	Lab review	(post-session)
Fri	`14:00-15:00`	Cohort retrospective	(post-session)

Learning outcomes

By the end of the week, every participant will:

Frame a problem as supervised, unsupervised, or reinforcement learning — and recognize when none of these is the right framing.
Build a clean train/validation/test pipeline that avoids leakage.
Choose, fit, and evaluate a simple model on a real dataset.
Read a learning curve and a confusion matrix without confusing yourself.

Topics covered

The supervised learning loop · loss functions and risk · empirical risk minimization · bias-variance · cross-validation, train/val/test splits, leakage · evaluation metrics for classification and regression · the no free lunch perspective on model choice.

Labs

Lab 1 — Patient readmission prediction

Build a logistic-regression pipeline predicting 30-day readmission. Evaluate not just by AUC but by Brier calibration and reliability diagram.

Dataset: MIMIC-IV demo (a clean 1{,}000-patient excerpt redistributed with the bootcamp).

Lab 2 — Diagnose the broken evaluation

A notebook is supplied with three deliberate leakage points and two metric-misuse traps. Find them, fix them, and write a 200-word memo explaining each.

Dataset: Synthetic, supplied.

Readings

Mandatory

Before Tuesday. Hastie, Tibshirani, Friedman, *The Elements of Statistical Learning*, chapter 7 (sections 7.1-7.4)
Before Wednesday. Andrew Ng, *Machine Learning Yearning*, chapters on dev/test splits and bias-variance

Optional deepening

Sebastian Raschka, *Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning* (arXiv:1811.12808) — Comprehensive review of CV variants and significance tests.