Case study · Energy · Time-series

Solar irradiance forecasting — Nairobi

90-day daily irradiance forecasting for Nairobi on a free, programmatic, decade-long dataset. GBM with weather covariates wins on point accuracy. Monthly climatology is a stronger baseline than most people expect. Structural state-space models give the operationally useful uncertainty signal. Same three-way trade-off as PJM, in a different physical regime.

Read · 7 min · 1,421 words Best model · GBM with weather covariates MAPE · 9.4% (vs SARIMA 13.8%, climatology 12.3%) SARIMA / UC PI coverage · 99% / 99% Data · 3,652 daily obs · 2014–2023 · NASA POWER

Summary

A gradient-boosted regressor that uses lagged irradiance plus same-day weather covariates (cloud cover, humidity, precipitation, temperature, wind) forecasts daily Nairobi irradiance with 9.4% MAPE across a 90-day horizon. SARIMA gets 13.8%; a UC state-space, 21.6%. Two findings worth keeping. Monthly climatology hits 12.3% MAPE (better than SARIMA), so the seasonal envelope alone explains most of the predictability. And both SARIMA and UC deliver 99% empirical PI coverage: for a battery-sizing or grid-balancing decision, the calibrated interval is what actually drives sizing, not the GBM's tighter mean.

Why this matters

Kenya runs one of the largest pay-as-you-go solar markets in Africa, with millions of household systems on sub-day battery capacity. The forecast question repeats every evening: charge the battery hard tonight, or assume tomorrow's sun will be enough? Same question at the utility level for grid integrators balancing solar against thermal and hydro. A 1-day-ahead irradiance error of 5% can flip whether a battery should be filled tonight or held in reserve. At 90-day horizons the question shifts to budgeting and storage-contract negotiation, but the metric vocabulary is the same.

The business question

Three operational consumers sit on the same forecast:

Three customers, same forecast, three different things they look at. The case study compares a SARIMA baseline, a structural state-space model, and an ML challenger to see which fits which need.

Data

NASA POWER API: free, programmatic, no auth required. 10 years (2014-01-01 → 2023-12-31) of daily values for Nairobi (lat -1.2921, lon 36.8219):

Last 90 days held out as the test window; everything before that is training. NASA POWER's free tier is generous and the API endpoint is one URL. The same pipeline switches to Lagos, Cape Town, Cairo, or Dakar by changing two numbers in download_data.py.

NASA POWER APINo authDaily resolution10 years6 weather variables

EDA

Three regularities dominate Nairobi's irradiance series, and they're physical not statistical: twice-yearly low in March–May and October–December (the "long rains" and "short rains"), twice-yearly high in January–February and July–September, and a cloud-cover anti-correlation at the daily level that's the whole story for short-horizon variance.

Daily irradiance, cloud cover, and precipitation 2014-2023
Figure 1. Top: daily surface irradiance for Nairobi. The bimodal annual pattern is visible — two annual peaks, two annual lows, modulated by inter-annual variability. Middle: daily cloud cover, with the same bimodal cycle inverted. Bottom: precipitation, peaking around the rainy seasons; the events explain the worst irradiance days.
Annual cycle: monthly mean irradiance vs cloud cover and precipitation
Figure 2. Annual cycle, monthly means. Irradiance (gold) and cloud cover (blue) are mirror-image bimodal: lowest irradiance and highest cloud cover hit twice a year (April-May and Oct-Nov, the rainy seasons). Precipitation (dark blue) reinforces the same pattern. This is the structure all three models have to learn.
Correlation matrix of NASA POWER variables for Nairobi
Figure 3. Correlation matrix. Irradiance correlates strongly with cloud cover (negative, ~−0.7) and humidity (negative, ~−0.4). Temperature is weakly positive: irradiance drives temperature, not the reverse. The cloud-cover signal is the lever that lets the GBM beat SARIMA: same-day cloud is a ground truth SARIMA can't access from lagged irradiance alone.

Modelling approach

Three primary candidates plus three baselines, all forecasting daily ALLSKY_SFC_SW_DWN 90 days ahead.

1. SARIMA

SARIMAX(2,0,2)(1,0,1)7: non-stationary AR/MA plus weekly seasonal AR/MA. Note the order. Irradiance is stationary in mean (cloud cycles oscillate around a fixed climatology), so no integration term. The weekly seasonal component is mostly absorbing measurement noise; solar isn't really a "weekly cycle" phenomenon at this latitude, but day-of-week sometimes correlates with measurement smoothing.

2. State-space — UnobservedComponents + annual Fourier exog

UnobservedComponents with local-linear-trend and four pairs of annual Fourier harmonics passed as exogenous regressors. The Fourier order of 4 is chosen because Nairobi's bimodal annual pattern needs more flexibility than a single sinusoid.

3. ML challenger — GBM with weather covariates

GradientBoostingRegressor(n_estimators=400, max_depth=3, learning_rate=0.05) on engineered features:

The cloud and humidity covariates are the lever. SARIMA and UC work only on lagged irradiance; GBM also gets yesterday's cloud cover and precipitation. The MAPE gap is mostly attributable to that information advantage.

Baselines

Three reference points: naive-last (predict tomorrow = today), naive-seasonal (predict tomorrow = irradiance one year ago), and monthly climatology (predict tomorrow = average irradiance for that calendar month, computed from the training window).

Results

90-day held-out test:

ModelMAPERMSE (kWh/m²/day)95% PI coverage
GBM (weather exog)9.42%0.68
Monthly climatology12.32%0.79
SARIMA(2,0,2)(1,0,1)713.78%0.8899%
Naive-seasonal (365-day lag)16.49%1.13
UC + Fourier annual exog21.63%1.3499%
Naive-last25.38%1.56
90-day forecast comparison for SARIMA, UC, and GBM against actual irradiance
Figure 4. 90-day held-out forecast. The black line is realised irradiance. GBM (blue dashed) tracks the realised series tighter than SARIMA or UC; the UC's 95% PI (shaded) is wide enough that nearly every realised observation stays inside.
Climatology is a strong baseline. That's a feature, not a bug. Monthly climatology's 12.3% MAPE beats SARIMA's 13.8%. It tells you the predictability ceiling for an irradiance series at this latitude is mostly the seasonal envelope: knowing the calendar month gives you most of what's knowable. The GBM's 9.4% gain over climatology comes almost entirely from same-day cloud and humidity covariates: the part that can't be guessed from the calendar alone.

Trade-offs

Deployment sketch

For pay-as-you-go solar operators and grid integrators:

Lessons

  1. Climatology is a real baseline; check it. A SARIMA that doesn't beat monthly climatology is a SARIMA that's overcomplicating the problem. The best forecast is sometimes the average of the calendar month, dressed up with a calibrated interval.
  2. Same-day weather covariates are the differentiator. The 4-percentage-point MAPE gain over climatology comes entirely from cloud-cover and humidity inputs. If your production system can't observe these in near-real-time, the structural model is your honest ceiling.
  3. NASA POWER plus a thirty-line download script is enough infrastructure for a city-level forecast service. No paid weather data, no licensing, no rate-limit anxieties. The pipeline trivially extends to any African city. The bottleneck is what you do with the forecast, not where the inputs come from.