Applied Bayesian Statistics — 4-Day Workshop
4-day workshop on applied Bayesian statistics: PyMC, MCMC, hierarchical models.
Instructor: Dr. Yaé Ulrich Gaba Duration: 4 days (24 hours) Level: Intermediate to Advanced Language: English
Overview
This workshop provides a practical introduction to Bayesian statistics with emphasis on modelling, computation, and real-world applications. Participants learn Bayesian thinking, build probabilistic models with PyMC, and apply hierarchical methods to problems in health, finance, and social science. The workshop bridges mathematical rigour with hands-on implementation.
Prerequisites
- Probability and statistics basics (distributions, likelihood, conditional probability, Bayes’ theorem)
- Python programming (NumPy, Matplotlib)
- Some familiarity with regression (linear/logistic) is helpful
- No prior Bayesian experience required
Learning Objectives
By the end of this workshop, participants will be able to:
- Think probabilistically and formulate problems in a Bayesian framework
- Specify prior distributions and understand their impact on inference
- Build and fit Bayesian models with PyMC
- Understand and diagnose MCMC sampling (trace plots, R-hat, effective sample size)
- Construct hierarchical (multilevel) models
- Perform model comparison and posterior predictive checks
- Apply Bayesian methods to domain-specific problems
Software Requirements
- Python 3.10+
- Libraries: pymc (v5+), arviz, numpy, matplotlib, seaborn, pandas, scipy
- Optional: Stan (via cmdstanpy), bambi (formula-based Bayesian models)
Day-by-Day Program
Day 1: Bayesian Thinking & First Models
Objectives: Understand the Bayesian paradigm and build first probabilistic models.
| Time | Topic |
|---|---|
| 09:00–10:00 | Why Bayesian? — Frequentist vs. Bayesian philosophy, probability as belief, advantages of the Bayesian approach: uncertainty quantification, small data, prior knowledge incorporation |
| 10:00–10:45 | Bayes’ Theorem in Practice — Prior × Likelihood = Posterior (up to normalization). Conjugate priors, analytical examples: Beta-Binomial, Normal-Normal |
| 10:45–11:00 | Break |
| 11:00–12:30 | Choosing Priors — Informative vs. weakly informative vs. non-informative priors. Prior predictive checks: does the prior generate plausible data? Common priors for standard parameters |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Introduction to PyMC — Model specification, random variables, observed data, sampling with NUTS, trace plots, ArviZ for diagnostics and visualization |
| 15:30–15:45 | Break |
| 15:45–17:00 | Bayesian Linear Regression — Normal likelihood, priors on coefficients and variance, posterior interpretation, credible intervals vs. confidence intervals, posterior predictive distribution |
Lab 1: Build a Bayesian linear regression model in PyMC: predict a health outcome (e.g., blood pressure ~ age + BMI). Explore the effect of different priors, visualize the posterior, and compare with frequentist OLS.
Homework: Fit a Bayesian regression on a dataset of your choice. Experiment with prior sensitivity.
Day 2: MCMC, Diagnostics & Generalized Models
Objectives: Understand how MCMC works and extend Bayesian models beyond linear regression.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review |
| 09:30–10:30 | How MCMC Works — The sampling problem, Metropolis-Hastings (intuition), Hamiltonian Monte Carlo (HMC), NUTS (No U-Turn Sampler). Why NUTS is the default |
| 10:30–10:45 | Break |
| 10:45–12:00 | Diagnosing MCMC — Trace plots, autocorrelation, R-hat (convergence), effective sample size (ESS), divergences in HMC. What to do when sampling fails: reparameterization, non-centered parameterization |
| 12:00–12:30 | Model Criticism — Posterior predictive checks: does the model generate data that looks like the real data? Residual analysis, calibration |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Bayesian Logistic Regression — Bernoulli/Binomial likelihood, logit link, priors for coefficients, interpreting posterior odds ratios, classification with uncertainty |
| 15:30–15:45 | Break |
| 15:45–17:00 | Bayesian GLMs — Poisson regression for count data, negative binomial for overdispersion, choosing the right likelihood family |
Lab 2: Build a Bayesian logistic regression for disease diagnosis (e.g., diabetes prediction). Perform full MCMC diagnostics, posterior predictive checks, and compare predicted probabilities with a frequentist logistic regression.
Homework: Fit a Poisson model to count data (e.g., number of doctor visits) and check for overdispersion.
Day 3: Hierarchical Models
Objectives: Build multilevel models that share information across groups.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review |
| 09:30–10:30 | Why Hierarchical? — The problem: too many groups, too little data per group. Complete pooling vs. no pooling vs. partial pooling. Shrinkage and the James-Stein phenomenon |
| 10:30–10:45 | Break |
| 10:45–12:30 | Hierarchical Linear Models — Varying intercepts, varying slopes, group-level predictors. The non-centered parameterization for efficient sampling. Visualizing partial pooling |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Hierarchical Models for Real Data — Multi-country health data: estimating country-level effects with partial pooling. Cross-classified and nested structures |
| 15:30–15:45 | Break |
| 15:45–17:00 | Model Comparison — WAIC, LOO-CV (Leave-One-Out Cross-Validation) with ArviZ, comparing models with different structures, information criteria interpretation |
Lab 3: Build a hierarchical model for educational outcomes across African countries: student test scores nested within schools within countries. Compare complete pooling, no pooling, and hierarchical estimates. Visualize shrinkage.
Homework: Extend the hierarchical model with a group-level predictor (e.g., school funding level).
Day 4: Advanced Topics & Applications
Objectives: Apply Bayesian methods to domain-specific problems and explore advanced techniques.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review |
| 09:30–10:30 | Bayesian Time Series — Autoregressive priors, Gaussian processes for time series, structural time series models, changepoint detection |
| 10:30–10:45 | Break |
| 10:45–12:00 | Mixture Models & Clustering — Gaussian mixture models, Bayesian nonparametrics (Dirichlet Process intuition), latent variable models |
| 12:00–12:30 | Bayesian A/B Testing — Comparing treatments/interventions, posterior probability of superiority, decision-making under uncertainty, advantages over p-values |
| 12:30–14:00 | Lunch |
| 14:00–15:00 | Domain Applications — Case studies: clinical trial analysis, credit risk modelling, epidemiological modelling (SIR with Bayesian inference), survey data analysis |
| 15:00–15:15 | Break |
| 15:15–16:15 | Capstone Project Work — Complete a Bayesian analysis on a chosen dataset |
| 16:15–17:00 | Presentations & Wrap-Up — Project presentations, Bayesian workflow summary, resources, certificates |
Lab 4 (Capstone): Choose one project:
- Health: Bayesian disease prevalence estimation with hierarchical models across regions
- Finance: Credit default modelling with Bayesian logistic regression and uncertainty quantification
- Education: Multilevel model of student performance with school and country effects
- Custom: Apply Bayesian methods to a problem from your own domain
Assessment
- Daily labs (40%) — Working models with proper diagnostics
- Capstone project (40%) — Complete Bayesian analysis with interpretation
- Participation (20%) — Engagement, homework, and discussions
Resources
- Bayesian Analysis with Python (3rd ed.) — Osvaldo Martin
- Statistical Rethinking (2nd ed.) — Richard McElreath
- PyMC Documentation
- ArviZ Documentation
- Bayesian Data Analysis (Gelman et al.)
Certificate
Participants who complete all labs and the capstone project receive a certificate of completion.