YG
Real African + Global Open Data

13 projects. Real data. Real models.

End-to-end data-science and machine-learning projects on real public data — predictive modeling, statistical inference, forecasting, optimization, and experimentation across healthcare, energy, finance, retail, and beyond.

At a glance

Headline numbers from the portfolio

6.2%
MAPE — PJM hourly load (GBM, vs SARIMA 14.5%)
7 cm
RMSE — Lake Kariba lake-level (30 days)
0.71
AUC — MTN Nigeria churn (XGBoost)
+122%
Coverage lift — Kenya mobile clinics (Q-learning vs manual)
Thirteen projects

Projects

01
SARIMA WAPE 0.94

Health Supply-Chain Demand Forecasting

12-month horizon on real PEPFAR shipment records — Rwanda focus.

Time Series
SARIMAUC state-spaceHolt-WintersGBM
USAID PEPFAR SCMS · 10,324 shipments
02
GBM MAPE 6.2% · UC PI 99%

Hourly Load Forecasting — PJM

State-space + Fourier exog vs SARIMA on 145k hours of PJME load.

Time Series
SARIMAUnobservedComponentsFourier exogGBM
PJM Hourly Consumption · 145k records
03
Tweedie Gini 0.310 · Lift 2.52

Insurance Claims Frequency & Severity

Poisson / Gamma / Tweedie GLMs on freMTPL2 with GBM challenger.

Regression / GLM
Poisson GLMGamma GLMTweedieXGBoost
freMTPL2 freq + sev · 678k policies
04
MinT-OLS reconciliation

Hierarchical Retail Demand — M5

SARIMA + GBM with MinT-OLS to make item × store × week forecasts coherent.

Time Series
SARIMAGBMMinT-OLSRMSSE
M5 sample · 13 items × 10 stores × 275 weeks
05
Q-learning +122% vs manual

Mobile-Clinic Scheduling — Kenya

MDP + Q-learning vs capped LP baseline on real KMPDC + SHA data.

Optimization
MDPQ-learningLinear programmingscipy.linprog
Kenya KMPDC + SHA · 7,876 facilities
06
GBM RMSE 7 cm · SARIMA PI 100%

River Flow Forecasting — Lake Kariba

Daily lake level on real Zambezi reservoir data; turbine discharge as exog.

Time Series
SARIMAState-spaceGBM exog
Lake Kariba reservoir · 1,155 daily obs
07
GBM MAPE 9.4% · 10-yr daily

Solar Forecasting — Nairobi

Daily irradiance with weather covariates from NASA POWER API.

Time Series
SARIMAUnobservedComponentsFourier annualGBM weather exog
NASA POWER · 3,652 days, 6 vars
08
Cox PH + Weibull AFT

Customer Survival — MTN Nigeria

Tenure-as-time, churn-as-event; KM, Cox PH, Weibull AFT, log-rank stratification.

Survival
Kaplan-MeierCox PHWeibull AFTLog-rank
MTN Nigeria · 974 customers
09
SARIMA MAPE 2.0% · price MAE 410 ZAR

Flight Demand & Price — Southern Africa

Daily volume forecast on top route + GBM price predictor across all routes.

Time Series
SARIMAGBMDynamic pricing
SA Flight Prices · 15,393 flights
10
GBM R² 0.57 · OLS 0.51

Property Valuation — Lagos

Bedroom + property-type + neighborhood features on 9,607 Lagos sale listings.

Regression / GLM
OLSGBMLog targetFeature engineering
Lagos Housing · 9,607 listings
11
GBM R² 0.66 (vs OLS 0.31)

Geospatial Farm-Output Forecasting

Predict farm sales from lat/lon + farm + climate features across multiple African countries.

SpatialRegression
Spatial featuresOLSGBMMulti-country
African Farm Households · 9,597 surveyed
12
XGBoost AUC 0.71

Churn Classification — MTN Nigeria

XGBoost vs RF vs LogReg with calibration plot and retention-queue ranking.

Classification
LogisticRandom ForestXGBoostCalibration
MTN Nigeria · 974 customers
13
ANOVA p < 1e-9 · Cohen's d 0.68

A/B Test Framework — Marketing

Welch t-tests + ANOVA + OLS adjustment + Bayesian posterior on a real 3-arm trial.

Experimentation
ANOVAWelch t-testBonferroniBayesian A/B
Fast-food A/B · 548 weekly obs · 3 arms
Same shape, every time

Process

Business question
Data & EDA
Modeling
Validation
Deployment
Business outcome

Data Kaggle CLI · NASA POWER · pandas · SQL · EDA matplotlib · seaborn · seasonal_decompose · Modeling statsmodels · scikit-learn · XGBoost · lifelines · Validation rolling-origin backtest · cross-validation · log-rank · ANOVA · calibration · Deployment FastAPI · Streamlit · pickled artifacts · scheduled retrain.

What this portfolio demonstrates

Capabilities

Techniques

Time-series & forecasting

  • SARIMA / ARIMA
    010204060709
  • UnobservedComponents (state-space)
    01020607
  • Holt-Winters / ETS
    01
  • Hierarchical reconciliation (MinT)
    04
  • Rolling-origin backtest · PI calibration
    01020607

GLMs & statistical modeling

  • Poisson / Gamma / Tweedie GLM
    03
  • Cox PH · Weibull AFT
    08
  • OLS · log-target regression
    1011
  • Logistic regression · L2
    12
  • Gini · Lorenz · top-decile lift
    03

Machine learning

  • Gradient-boosted trees (sklearn, XGBoost)
    02030607101112
  • Random Forest
    12
  • Lag · rolling · calendar feature engineering
    020607
  • Calibration · ROC / PR curves
    12

Optimization & experimentation

  • Markov decision processes · Q-learning
    05
  • Linear programming (scipy.linprog)
    05
  • ANOVA · Welch t · Bonferroni
    13
  • Bayesian A/B (posterior simulation)
    13
  • Stratified analysis (Simpson guard)
    13

Tools

Languages & data

PythonSQLpandasNumPySciPyPostgreSQL

Modeling libraries

statsmodelsscikit-learnXGBoostLightGBMlifelines

Engineering & viz

FastAPIStreamlitDockerGitJupytermatplotlibseabornLaTeX
Who

Background

PhD in Mathematics (Topology) from the University of Cape Town. Career split between rigorous applied mathematics and hands-on data science, with 10+ years of experience across healthcare, finance, energy, insurance, retail, and government environments. Co-author of The Shape of Data (No Starch Press, 2024) — a graduate-level textbook on geometry-based machine learning. h-index 12 across 18+ peer-reviewed papers. Bilingual EN/FR.