Week 01 — Python for data work
Module 1: idiomatic Python, NumPy, Pandas, reproducible projects, Jupyter discipline.
Take the Python you already half-know and make it precise enough to ship.
What you ship this week
Public GitHub repo with the cleaned-up Lab-1 refactor: README.md, pyproject.toml, working pytest suite, CI passing on a single push.
| Due | Friday 18:00 (Africa/Lagos (UTC+1)) |
|---|---|
| Submission | Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week. |
| Rubric | Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code. |
Live sessions and labs
Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.
| Day | Time | Block | Recording |
|---|---|---|---|
| Mon | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Mon | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Tue | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Tue | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Wed | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Wed | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Thu | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Thu | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Fri | 10:00-11:00 | Industry speaker | (post-session) |
| Fri | 11:30-12:30 | Lab review | (post-session) |
| Fri | 14:00-15:00 | Cohort retrospective | (post-session) |
Learning outcomes
By the end of the week, every participant will:
- Write idiomatic Python — comprehensions, generators, context managers, decorators.
- Use NumPy and Pandas fluently for vectorized data manipulation.
- Build a reproducible Python project (virtualenv,
pyproject.toml, pre-commit, basic testing). - Read, modify, and write Jupyter notebooks without losing reproducibility.
Topics covered
Data types and control flow · functions, scoping, closures · classes and protocols · NumPy arrays, broadcasting, indexing · Pandas Series and DataFrames, joins, group-by, reshape · matplotlib and seaborn for plotting · virtual environments and dependency management · Git basics for code and notebooks.
Labs
Lab 1 — Refactor the 300-line script
A deliberately ugly messy_data_pipeline.py is provided. Refactor into a typed src/ package with pytest coverage that catches the two latent bugs, a pyproject.toml, a README.md, and a passing GitHub Actions workflow.
Dataset: Kenya Health Facilities Registry, December 2024 export (~12,000 facilities)
Lab 2 — Pandas wrangling on real data
From raw Kenya health-facility data, produce an analytical DataFrame answering three questions: public vs private share by county, underserved areas by population per facility, and whether weighting by facility capacity changes the picture.
Dataset: Same dataset as Lab 1, with population overlays.
Lab 3 — Publish your project
Push the refactored Lab 1 package to a public GitHub repo with a release tag, a versioned pyproject.toml, and a CI badge. Add it to your bootcamp profile.
Dataset: —
Readings
Mandatory
- Before Tuesday. Hitchhiker's Guide to Python: "Writing great Python code" chapters (skim)
- Before Wednesday. Wes McKinney, *Python for Data Analysis* (3rd ed., 2022), chapters 4-5
- Before Thursday. *Python for Data Analysis* chapters 8 and 10
Optional deepening
- Luciano Ramalho, *Fluent Python* (2nd ed., 2022), chapters 17-18 and 24 — For participants who already know Pandas well.