Long-form deep-dives

Case studies

Selected portfolio projects with full narrative — business context, methodology, results, trade-offs, and deployment sketches. Each case study draws plots from its executed notebook so the page stands alone.

Energy · Time-series

Hourly load forecasting — PJM East

GBM 6.2% MAPE · UC PI coverage 99%

Day-ahead and short-horizon load forecasts on 16 years of real PJME hourly consumption — comparing SARIMA, state-space (UnobservedComponents + Fourier exog), and gradient-boosted forecasters. The "best" model depends on whether procurement or risk teams are reading.

SARIMAUnobservedComponentsFourier exogGBMPJM Hourly Consumption
60-day forecast comparison: SARIMA vs UC vs GBM on PJM data
Hydropower · Time-series

River flow forecasting — Lake Kariba

GBM 7 cm RMSE · SARIMA PI coverage 100%

Daily lake-level forecasting for the world's largest man-made reservoir, on real Zambezi River Authority data. A 7 cm RMSE on a 7 m operational band turns "are we going to breach turbine safety thresholds in the next 30 days?" from a guess into a calibrated probability.

SARIMAState-spaceGBM with exogLake Kariba reservoirZambezi basin
30-day forecast comparison: SARIMA vs UC vs GBM on Lake Kariba data
Solar · Time-series

Solar irradiance forecasting — Nairobi

GBM 9.4% MAPE · climatology 12.3% MAPE · SARIMA/UC PI 99%

10 years of daily NASA POWER data over Nairobi, three forecasters compared against a stubbornly hard baseline: monthly climatology. The headline finding is uncomfortable for forecasters — at this latitude the seasonal envelope is most of the signal, and same-day weather covariates are what actually move point accuracy.

SARIMAUnobservedComponentsFourier annualGBM weather exogNASA POWER API
90-day solar irradiance forecast comparison: climatology vs SARIMA vs GBM
Insurance · GLM / Pricing

Pure-premium pricing — freMTPL2

Tweedie Gini 0.310 · Poisson+Gamma top-decile lift 2.66

Three GLM families and a gradient-boosted challenger compared on the canonical French motor third-party liability dataset (678k policies). Tweedie wins on segmentation power (Gini); Poisson + Gamma wins on top-decile lift. Picking between them is an actuarial decision, not a modelling one.

Poisson GLMGamma GLMTweedieXGBoostGini · Lorenz · lift
Lorenz curves comparing Tweedie, Poisson+Gamma, and GBM
Health Ops · Stochastic Optimization

Mobile-clinic scheduling — Kenya

Q-learning +122% · LP-capped +39% with explicit equity

Mobile-clinic dispatch as a Markov decision process on real KMPDC + SHA Kenyan data. Three policies — manual round-robin, capped LP, tabular Q-learning — compared on patients-served, travel cost, and equity. The honest finding: the algorithm matters less than how the constraints are written.

MDPQ-learningLinear programmingscipy.linprogReal KMPDC + SHA
Q-learning training reward curve over 400 episodes
Five deep-dives now span four methodology families — time-series (PJM, Kariba, Nairobi solar), GLMs / actuarial pricing (freMTPL2), and stochastic optimization (Kenya). The 8 other portfolio projects each have their own folder-level README and executable notebook.