Week 01 — Python for data work

Module 1: idiomatic Python, NumPy, Pandas, reproducible projects, Jupyter discipline.

Module M1  |  ← schedule |  week 02 →

Take the Python you already half-know and make it precise enough to ship.

What you ship this week

Public GitHub repo with the cleaned-up Lab-1 refactor: README.md, pyproject.toml, working pytest suite, CI passing on a single push.

Due Friday 18:00 (Africa/Lagos (UTC+1))
Submission Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week.
Rubric Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code.

Live sessions and labs

Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.

Day Time Block Recording
Mon 09:00-12:00 Live instruction + code-along (post-session)
Mon 14:00-16:00 Independent lab work + TA office hours (post-session)
Tue 09:00-12:00 Live instruction + code-along (post-session)
Tue 14:00-16:00 Independent lab work + TA office hours (post-session)
Wed 09:00-12:00 Live instruction + code-along (post-session)
Wed 14:00-16:00 Independent lab work + TA office hours (post-session)
Thu 09:00-12:00 Live instruction + code-along (post-session)
Thu 14:00-16:00 Independent lab work + TA office hours (post-session)
Fri 10:00-11:00 Industry speaker (post-session)
Fri 11:30-12:30 Lab review (post-session)
Fri 14:00-15:00 Cohort retrospective (post-session)

Learning outcomes

By the end of the week, every participant will:

  1. Write idiomatic Python — comprehensions, generators, context managers, decorators.
  2. Use NumPy and Pandas fluently for vectorized data manipulation.
  3. Build a reproducible Python project (virtualenv, pyproject.toml, pre-commit, basic testing).
  4. Read, modify, and write Jupyter notebooks without losing reproducibility.

Topics covered

Data types and control flow · functions, scoping, closures · classes and protocols · NumPy arrays, broadcasting, indexing · Pandas Series and DataFrames, joins, group-by, reshape · matplotlib and seaborn for plotting · virtual environments and dependency management · Git basics for code and notebooks.

Labs

Lab 1 — Refactor the 300-line script

A deliberately ugly messy_data_pipeline.py is provided. Refactor into a typed src/ package with pytest coverage that catches the two latent bugs, a pyproject.toml, a README.md, and a passing GitHub Actions workflow.

Dataset: Kenya Health Facilities Registry, December 2024 export (~12,000 facilities)

Lab 2 — Pandas wrangling on real data

From raw Kenya health-facility data, produce an analytical DataFrame answering three questions: public vs private share by county, underserved areas by population per facility, and whether weighting by facility capacity changes the picture.

Dataset: Same dataset as Lab 1, with population overlays.

Lab 3 — Publish your project

Push the refactored Lab 1 package to a public GitHub repo with a release tag, a versioned pyproject.toml, and a CI badge. Add it to your bootcamp profile.

Dataset:

Readings

Mandatory

Optional deepening

  • Luciano Ramalho, *Fluent Python* (2nd ed., 2022), chapters 17-18 and 24 — For participants who already know Pandas well.

Builds on (course catalogue)