Week 02 — Introduction to machine learning

Module 2: framing ML problems, the train/val/test pipeline, leakage, evaluation metrics, learning curves.

Module M2  |  ← schedule |  ← week 01 |  week 03 →

What ML actually is, what it isn't, and the workflow that runs underneath every project.

What you ship this week

Notebook + 600-word writeup: a clean train/validation/test pipeline on a real hospital dataset, with calibration analysis and an explicit leakage audit.

Due Friday 18:00 (Africa/Lagos (UTC+1))
Submission Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week.
Rubric Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code.

Live sessions and labs

Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.

Day Time Block Recording
Mon 09:00-12:00 Live instruction + code-along (post-session)
Mon 14:00-16:00 Independent lab work + TA office hours (post-session)
Tue 09:00-12:00 Live instruction + code-along (post-session)
Tue 14:00-16:00 Independent lab work + TA office hours (post-session)
Wed 09:00-12:00 Live instruction + code-along (post-session)
Wed 14:00-16:00 Independent lab work + TA office hours (post-session)
Thu 09:00-12:00 Live instruction + code-along (post-session)
Thu 14:00-16:00 Independent lab work + TA office hours (post-session)
Fri 10:00-11:00 Industry speaker (post-session)
Fri 11:30-12:30 Lab review (post-session)
Fri 14:00-15:00 Cohort retrospective (post-session)

Learning outcomes

By the end of the week, every participant will:

  1. Frame a problem as supervised, unsupervised, or reinforcement learning — and recognize when none of these is the right framing.
  2. Build a clean train/validation/test pipeline that avoids leakage.
  3. Choose, fit, and evaluate a simple model on a real dataset.
  4. Read a learning curve and a confusion matrix without confusing yourself.

Topics covered

The supervised learning loop · loss functions and risk · empirical risk minimization · bias-variance · cross-validation, train/val/test splits, leakage · evaluation metrics for classification and regression · the no free lunch perspective on model choice.

Labs

Lab 1 — Patient readmission prediction

Build a logistic-regression pipeline predicting 30-day readmission. Evaluate not just by AUC but by Brier calibration and reliability diagram.

Dataset: MIMIC-IV demo (a clean 1{,}000-patient excerpt redistributed with the bootcamp).

Lab 2 — Diagnose the broken evaluation

A notebook is supplied with three deliberate leakage points and two metric-misuse traps. Find them, fix them, and write a 200-word memo explaining each.

Dataset: Synthetic, supplied.

Readings

Mandatory

Optional deepening

  • Sebastian Raschka, *Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning* (arXiv:1811.12808) — Comprehensive review of CV variants and significance tests.

Builds on (course catalogue)