Machine Learning & AI Bootcamp
A 10-week experiential cohort program — ten modules from Python through MLOps, deployed capstone, public portfolio.
Program Overview
| Operational pages. Weekly schedule | Capstone brief |
In January 2023, BioNTech acquired InstaDeep — a machine-learning company founded in Tunis with offices in Lagos, Nairobi, and Paris — for $562 million. The deal made public what recruiters across the continent had been observing for years: global demand for production-ready ML engineers far exceeds trained supply, and the gap is wider in Africa than anywhere else. Most African undergraduate computer-science curricula end before deep learning; the MOOCs that fill the void are calibrated to North-American prerequisites and usually presume a prior software-engineering job. This bootcamp targets that gap directly. It takes quantitatively trained graduates — mathematics, statistics, computer science, quantitative finance, scientific engineering — and brings them in ten weeks to the production-ready level that African and international ML teams now hire for.
Program at a glance
| Live sessions | 60+ |
| Capstone project | 1 (team or individual) |
| Industry speakers | 8 |
| Course hours | 200 |
| Self-study (FT) | 20 hours / week |
| Self-study (PT) | 10 hours / week |
| Office hours | Weekly, with teaching assistants |
| Final deliverable | Deployed capstone + portfolio piece |
| Certificate | Yes, on completion of all modules + capstone |
The 200 course hours are split roughly 60 % live (lectures, labs, code-along sessions, capstone reviews) and 40 % structured asynchronous work (reading, lab notebooks, problem sets). The 60+ live sessions include lectures, hands-on labs, industry-speaker sessions, office hours, capstone milestone reviews, and the final showcase.
Tooling
- Python 3.11+ as the working language throughout.
- Jupyter (notebook + Lab) for labs.
- NumPy, Pandas, scikit-learn, matplotlib, seaborn as the analytics stack.
- PyTorch (primary) and TensorFlow/Keras (secondary) for deep learning.
- HuggingFace ecosystem (
transformers,datasets,peft,accelerate) for NLP and LLMs. - LangChain, ChromaDB, FAISS for RAG.
- MLflow, DVC, Docker, FastAPI for the MLOps module.
- Google Colab (free tier with GPU) as fallback compute; participants without local GPU will use Colab Pro or Kaggle for the deep-learning weeks.
- Git / GitHub for version control across the bootcamp; the final portfolio piece is published on the participant’s GitHub.
The ten modules
Each module below maps to existing course materials in the course catalogue when a corresponding course exists. The bootcamp condenses, sequences, and adds project work on top of the catalogue, rather than reinventing the content.
Module 1 — Python for data work
One sentence. Take the Python you already half-know and make it precise enough to ship.
Topics. Data types and control flow · functions, scoping, closures · classes and protocols · NumPy arrays, broadcasting, indexing · Pandas Series and DataFrames, joins, group-by, reshape · matplotlib and seaborn for plotting · virtual environments and dependency management · Git basics for code and notebooks.
Labs. (1) Refactor a messy 300-line script into a clean module with tests. (2) Wrangle a real public dataset (Kenyan health facility data) end-to-end in Pandas. (3) Publish your project to GitHub with a README, lockfile, and reproducible install.
Module 2 — Introduction to machine learning
One sentence. What ML actually is, what it isn’t, and the workflow that runs underneath every project.
Topics. The supervised learning loop · loss functions and risk · empirical risk minimization · bias-variance · cross-validation, train/val/test splits, leakage · evaluation metrics for classification and regression · the “no free lunch” perspective on model choice.
Labs. (1) Predict patient readmission on a real hospital dataset (logistic regression, evaluated with calibration not just accuracy). (2) Diagnose a bad evaluation: spot the leakage in a deliberately broken notebook.
Module 3 — Classical machine learning
One sentence. The pre-deep-learning toolkit — still the right answer for most tabular problems.
Topics. Linear and logistic regression · regularization (ridge, lasso, elastic net) · SVMs and the kernel trick · decision trees, random forests, gradient boosting (XGBoost, LightGBM) · clustering (k-means, hierarchical, DBSCAN, GMM) · dimensionality reduction (PCA, UMAP, t-SNE) · model interpretation (permutation importance, SHAP, partial dependence) · what these methods can and cannot tell you about causation.
Labs. (1) Credit-scoring on a Kaggle / African bank dataset — full pipeline with fairness audit. (2) Customer segmentation by mobile-money transaction patterns. (3) SHAP-based interpretation of an XGBoost model, including the failure modes of SHAP itself.
Module 4 — Recommender systems
One sentence. How Netflix-style systems actually work, and the honest evaluation problem they create.
Topics. The recommendation problem · explicit vs implicit feedback · collaborative filtering · matrix factorization (SVD, ALS, NMF) · content-based filtering with embeddings · hybrid models · evaluation: precision@k, recall@k, NDCG, MAP, online vs offline metrics · cold-start, popularity bias, filter bubbles.
Labs. (1) MovieLens collaborative filter (SVD then ALS). (2) Build a hybrid recommender on a publicly available e-commerce dataset. (3) Run the same recommender against three evaluation metrics and explain why they rank models differently.
Module 5 — Natural language processing
One sentence. From the linguistics-aware classical methods to the transformer-era pipeline.
Topics. Text preprocessing and tokenization (BPE, WordPiece) · word embeddings (Word2Vec, GloVe, FastText) · sequence models (RNN, LSTM, GRU) · the Transformer architecture · BERT-family models and fine-tuning · NER, sentiment, classification, summarization · multilingual and low-resource NLP · evaluation: BLEU, ROUGE, exact match, human evaluation.
Labs. (1) Sentiment classification on customer reviews (logistic regression baseline → fine-tuned DistilBERT). (2) Named entity recognition on a multilingual dataset including at least one African language. (3) Document summarization with a fine-tuned T5/BART.
Module 6 — Modern machine learning: ANN, CNN, RNN
One sentence. Deep learning end-to-end, with enough theory to know when not to use it.
Topics. Feed-forward networks, backpropagation, optimization (SGD, Adam, AdamW) · regularization (dropout, batch norm, weight decay, early stopping) · CNNs (LeNet, AlexNet, ResNet, modern architectures) · RNNs, LSTMs, GRUs · attention as a primitive · representation learning · the bitter lessons of deep learning.
Labs. (1) Build a 2-layer MLP from scratch in NumPy, then port to PyTorch. (2) Image classification on a public medical imaging dataset. (3) Sequence prediction on a financial time-series dataset with an LSTM.
Module 7 — LLMs and generative AI
One sentence. What’s under the hood of GPT/Claude/LLaMA, what you can actually do with them, and where they fail.
Topics. The transformer, attention mechanism, scaling laws · pretraining, fine-tuning, RLHF/DPO at survey level · prompt engineering, structured output, function calling · parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) · RAG: chunking, embedding, retrieval, reranking, generation · diffusion models · evaluation: BLEU, ROUGE, LLM-as-judge, human eval · agentic systems and tool use · safety, alignment, hallucination, bias.
Labs. (1) Build a RAG system over a domain-specific corpus (e.g., WHO/AFRO health reports) using ChromaDB and a local or API LLM. (2) Fine-tune a 1-7B-parameter open model with LoRA on a domain task. (3) Build a multi-step agent with tool use (search, calculator, code execution) using LangChain or LangGraph.
Module 8 — MLOps and deployment
One sentence. What it takes for the model to keep working after the notebook is closed.
Topics. Reproducibility (Git, DVC, MLflow) · experiment tracking and model registry (MLflow, Weights & Biases) · containerization (Docker, docker-compose) · serving (FastAPI, BentoML, model registries) · monitoring (data drift, prediction drift, performance drift, latency, cost) · CI/CD for ML pipelines.
Labs. (1) Wrap a trained model in a FastAPI service, containerize, deploy to a free-tier cloud, call it from a notebook. (2) Set up MLflow tracking for a model retraining loop. (3) Simulate data drift on a deployed model and detect it from the monitoring dashboard.
Module 9 — Capstone project
One sentence. Take a real problem from idea to deployed system in two weeks.
This module runs in parallel with Modules 8 and 10, occupying the last two weeks of the program. Each participant (or small team of 2-3) ships a complete project: real dataset, real model, deployed endpoint, written report, public repository, live demo.
See CAPSTONE.md for the full project specification, milestones, evaluation rubric, and example project tracks.
Module 10 — Portfolio
One sentence. The capstone, the labs, and a clear public profile that says “I can actually do this.”
Topics. Portfolio curation (less is more) · README discipline · technical writing for ML projects · publishing notebooks (nbviewer, Colab, GitHub Pages, Jupyter Book) · hosting model demos (Gradio, Streamlit, HuggingFace Spaces) · using your portfolio to ask better technical interview questions.
Schedule (10-week full-time cohort)
| Week | Focus | Modules |
|---|---|---|
| 1 | Python toolkit for data work | M1 |
| 2 | Introduction to ML + start of classical ML | M2 + M3 (part 1) |
| 3 | Classical ML in depth, credit-scoring lab | M3 |
| 4 | Recommender systems | M4 |
| 5 | Natural language processing | M5 |
| 6 | Deep learning (ANN, CNN, RNN) | M6 |
| 7 | LLMs and generative AI | M7 |
| 8 | MLOps and deployment | M8 |
| 9 | Capstone work-week + portfolio sessions | M9 + M10 |
| 10 | Capstone wrap-up, final presentations, portfolio launch | M9 + M10 |
Part-time cohorts run the same content over 20 weeks at half the weekly intensity. Industry-speaker sessions are distributed across weeks 2-9.
Pedagogy
The “experiential approach” is not marketing — it’s a deliberate choice with two concrete consequences:
-
Every module has a real dataset and a deployable artifact. Not a tutorial dataset chosen for pedagogical neatness, but data that participants will plausibly encounter in their own work: African banking records, health facility data, telecom CDRs, satellite imagery, multilingual text. The artifact at the end of the module is something a participant could put on GitHub.
-
Mistakes are part of the curriculum. Several labs are deliberately seeded with leakage, bad evaluation choices, or training pathologies. The exercise is to find them. The point is that spotting a broken ML pipeline is more valuable than building a working one in a vacuum.
Assessment and certificate
- Weekly lab deliverables (40 %) — each module’s labs are graded on correctness, code quality, and a short written interpretation.
- Capstone project (40 %) — deployed system + technical writeup + live demo.
- Participation (20 %) — engagement in live sessions, peer code review, capstone milestone presentations.
Participants who complete all module deliverables and pass the capstone (rubric in CAPSTONE.md) receive a Certificate of Completion signed by the lead instructor and AIRINA Labs. The certificate is a record of completion, not an accredited degree.
Industry-speaker series
The 8 industry speakers across the cohort are drawn from the AIRINA Labs and AIMS networks: ML/AI practitioners at African banks, telecoms, fintechs, and the regional offices of global tech firms; one or two senior researchers at international labs. Each speaker session is one hour: 30 minutes of “what we actually build and why”, 30 minutes of cohort Q&A. Speakers do not pitch products and do not recruit.
Post-program
For three months after the cohort ends, participants retain access to:
- The cohort Slack workspace (peer network + alumni channel).
- One 30-minute 1-on-1 with the lead instructor for career or technical follow-up.
- The hosted version of their capstone (instructor will keep the deployment up for at least 3 months on a free-tier cloud).
This isn’t a job-placement service. The goal is to leave participants with a working portfolio, a peer network, and the technical foundation to take their own next step.
Resources
- Modules 1-3 (Python, ML foundations). Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed., O’Reilly).
- Module 3 (Classical ML). Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning (Springer, 2nd ed.).
- Module 4 (Recommender systems). Charu Aggarwal, Recommender Systems: The Textbook (Springer, 2016).
- Module 5 (NLP). Dan Jurafsky and James Martin, Speech and Language Processing (3rd ed. draft).
- Module 6 (Deep learning). Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning (MIT Press, 2016).
- Module 7 (LLMs / GenAI). Sebastian Raschka, Build a Large Language Model From Scratch (Manning, 2024). HuggingFace NLP course.
- Module 8 (MLOps). Chip Huyen, Designing Machine Learning Systems (O’Reilly, 2022).
- Topology / geometry-aware ML companion. Colleen M. Farrelly and Yaé Ulrich Gaba, The Shape of Data (No Starch Press).
Learning Outcomes
By the end of the bootcamp, participants will be able to:
- Build, evaluate, and deploy a complete ML system end-to-end — from raw data to a working hosted demo.
- Choose the right model family for a problem and defend the choice against simpler baselines.
- Diagnose the standard failure modes (data leakage, miscalibration, distribution shift, dead training) before they ship.
- Apply the modern LLM toolkit (prompt engineering, fine-tuning with LoRA, RAG, agents) to real domain problems.
- Set up the MLOps machinery (versioning, containerization, monitoring, CI/CD) needed to keep a deployed model alive.
- Curate a public portfolio that demonstrates working production-grade ML to a technical interviewer.
Who Should Attend
- Working professionals moving into ML/AI roles from adjacent fields — software engineering, data analysis, quantitative finance, actuarial work, scientific computing.
- Upper-undergraduate and masters students in mathematics, computer science, statistics, engineering, or quantitative social science who want an intensive route rather than a semester-paced course.
- Corporate cohorts running an internal upskilling program — banks, telecoms, insurers, fintechs, public-sector data teams.
You will get more out of this if you arrive with comfortable Python syntax (functions, classes, comprehensions), familiarity with the command line, and undergraduate-level mathematics (linear algebra, calculus, probability). No prior machine-learning experience is required, but it helps.
Prerequisites:
- Required. Python proficiency at the level of “can read and modify a 200-line script without getting lost”. Comfortable with functions, classes, imports, dictionaries, list comprehensions. Familiar with
pipand virtual environments. - Required. Linear algebra (vectors, matrices, eigenvalues), calculus (derivatives, gradients, integrals), probability (random variables, expectation, conditional probability) at undergraduate level.
- Strongly recommended. Some exposure to NumPy / Pandas.
- Helpful but not required. Prior ML exposure, familiarity with Git/GitHub, basic Linux command line.
If you are not sure whether you have the prerequisites, the application includes a short technical screen. Honest self-assessment up front is better than scrambling in week 1.
Brochure
For a printable one-page brochure suitable for forwarding to a corporate L&D team, university department, or admissions committee, write to gabayae2@gmail.com with the audience size and intended cohort window.
To apply directly for a future open cohort, or to discuss a private corporate cohort for your organization, write to gabayae2@gmail.com. Please include your background, your goal for the bootcamp, and whether you’re asking about an open cohort (individual) or a private corporate cohort (team). For corporate cohorts, also include the team size and the time window you’re targeting.