Case study · Energy · Time-series

Hourly load forecasting — PJM East

Day-ahead and short-horizon load forecasts on 16 years of real PJME hourly consumption data — comparing SARIMA, state-space (UnobservedComponents + Fourier exog), and gradient-boosted forecasters on a held-out 60-day window.

Read · 5 min · 1,025 words Best model · GBM with engineered features MAPE · 6.2% (vs SARIMA 14.5%, UC 19.3%) Data · 145,392 hourly observations · 2002–2018 UC 95% PI coverage · 99%

Summary

A gradient-boosted regressor with engineered lag, rolling, and harmonic-calendar features beats classical SARIMA on day-ahead PJM East load by more than half: 6.2% MAPE vs. 14.5%. A structural state-space model (UnobservedComponents with annual Fourier exog) trails on point-forecast accuracy but delivers near-perfect prediction-interval calibration (99% empirical coverage at the nominal 95% level), which matters for risk-aware procurement.

The right model depends on what the trading desk is optimising for. If the only goal is the lowest expected error, GBM wins; if the desk is sizing hedges and needs honest uncertainty bands, state-space wins.

The business question

A regional grid operator buys electricity day-ahead and rebalances intra-day. Forecast errors translate directly into imbalance penalties: the operator is short or long against actual demand and pays the spread. Two operational questions sit on top of the same forecast:

A SARIMA baseline was already in place. Could a state-space model or an ML model do better, and on which dimension?

Data

Real PJM East (PJME) hourly metered consumption from the public Kaggle dataset robikscube/hourly-energy-consumption: 145,392 hourly observations from 1 Jan 2002 to 3 Aug 2018. The case study uses the trailing window 2015-01-01 → 2018-08-03 (~31,400 hours / ~3.6 years) to keep training tractable on a single machine.

CSV · KaggleHourly resolutionSingle seriesNo NaNs after interpolation

EDA

Three regularities dominate the series: a strong daily cycle (peak around 18:00, trough around 04:00), a weekly cycle with weekend dips, and an annual cycle driven by summer cooling and winter heating loads. A holiday calendar effect is visible (Christmas, July 4) but smaller than the seasonal envelope.

PJME load: last 30 days (hourly), full subset daily mean, monthly mean
Figure 1. Top: last 30 days of hourly load showing the daily cycle and weekend dip. Middle: daily-mean over the full 2015–2018 subset. Bottom: monthly mean exposing the U-shaped annual cycle (summer cooling and winter heating peaks).
Hour-of-day × day-of-week heatmap of mean load
Figure 2. Mean load by hour-of-day × day-of-week. Weekday afternoons concentrate the load; weekends are uniformly lower. Any forecast model has to absorb at least these two cycles to be competitive.

Modelling approach

Three candidates, all trained on the daily-mean series for tractability and forecast over a 60-day held-out window:

1. SARIMA baseline

SARIMAX(2,1,2)(1,1,1)7 — non-seasonal AR/MA + weekly seasonal AR/MA. Captures the weekly cycle directly but absorbs the annual cycle only through the slow-moving non-stationary integration term, which under-fits the U-shape.

2. State-space — UnobservedComponents + Fourier exog

UnobservedComponents(level='local linear trend', seasonal=7, exog=Fourier(365.25, order=3)). The local-linear-trend component absorbs the slow drift, the discrete weekly seasonal handles day-of-week, and three pairs of annual Fourier harmonics passed as exogenous regressors absorb the annual cycle smoothly. Maximum-likelihood fit with the Kalman filter for predictive intervals.

3. ML challenger — gradient-boosted regressor

GradientBoostingRegressor(n_estimators=400, max_depth=3, learning_rate=0.05) on engineered features:

No exogenous weather inputs — temperature would almost certainly improve all three, but the goal here is to compare the modelling families on the same purely-endogenous information set.

Results

On the 60-day held-out window:

ModelMAPERMSE (MW)95% PI coverage
GBM (engineered features)6.24%2,653
SARIMA(2,1,2)(1,1,1)714.45%6,67690.0%
UC + Fourier annual exog19.28%8,295100.0%
60-day forecast comparison: SARIMA vs UC vs GBM
Figure 3. 60-day held-out forecast. The GBM tracks the seasonal envelope visibly tighter than SARIMA or UC; SARIMA's 95% prediction interval (shaded) gives well-calibrated uncertainty; UC drifts slightly low on the mean but its interval contains every realised observation.
The trade-off in one line. If the question is "what's tomorrow's load?", GBM wins (~57% MAPE reduction over SARIMA). If the question is "give me a 95%-confident range for tomorrow's load," state-space wins — its empirical coverage matches the nominal level closely, while GBM has no native interval at all (would need quantile-regression boosting or conformal prediction to compete on this dimension).

Trade-offs

Deployment sketch

In production I would ship the ensemble, not just the winner: GBM for the point forecast, UC for the prediction interval. Operationally:

Lessons

  1. "Best" is metric-conditional. Six percent MAPE looks decisive against fourteen percent. But the state-space model's calibrated intervals are operationally more valuable for some downstream uses than the GBM's tighter mean.
  2. Engineered features beat structural priors when the data is rich. 31k hours is plenty for GBM to learn the seasonality the structural model has to be told about. With 5 years of monthly data instead of 4 years of hourly data, this ranking would likely flip.
  3. Don't pick a model. Pick a forecasting system. Mean from the ML model, interval from the structural model, monitoring on both.