Data-Science Portfolio

13 end-to-end data-science projects on real public data, with 5 long-form bilingual case studies. Bilingual EN/FR.

A bilingual portfolio of 13 end-to-end data-science projects on real public data (Kaggle, NASA POWER, USAID PEPFAR), each with a business question, executable Jupyter notebook, methodology write-up, and validation metrics. Five projects have long-form case studies with results tables, model comparisons, and honest trade-off analysis.

Live site: gabayae.github.io/data-portfolio · Version française

Source code: github.com/gabayae/data-portfolio

Methodology coverage

  • Time-series forecasting: SARIMA, UnobservedComponents (state-space + Kalman), gradient-boosted regressors with weather exogenous covariates
  • GLMs / actuarial pricing: Poisson + Gamma, Tweedie compound-Poisson, monotonic-constrained boosting
  • Survival analysis: Kaplan-Meier, Cox proportional hazards, Weibull AFT
  • Stochastic optimization / RL: Markov decision processes, Q-learning, capped linear programming
  • Hierarchical reconciliation: MinT-OLS across SKU × store × week
  • Experimental design: ANOVA, Welch t-tests with Bonferroni correction, Bayesian A/B
  • Hourly load forecasting (PJM East): GBM 6.2% MAPE, UC PI coverage 99% — the right model depends on whether procurement or risk is reading
  • Lake Kariba river flow: GBM 7 cm RMSE on a 7 m operational band — turns “will we breach turbine safety in 30 days” from guess into calibrated probability
  • Solar irradiance (Nairobi, NASA POWER): monthly climatology beats SARIMA at this latitude — the seasonal envelope is most of the predictability
  • freMTPL2 pricing: Tweedie wins on Gini, Poisson + Gamma wins on top-decile lift — the choice is actuarial, not technical
  • Kenya mobile clinics: Q-learning vs capped LP — the constraint formulation matters more than the algorithm