Python for Data Science — 5-Day Workshop |

Instructor: Dr. Yaé Ulrich Gaba Duration: 5 days (30 hours) Level: Beginner to Intermediate Language: English

Overview

This hands-on workshop takes participants from zero Python experience to building their first machine learning models. Through daily coding labs, real-world datasets, and progressive projects, learners develop practical data science skills grounded in solid programming fundamentals.

Prerequisites

Basic computer literacy (file management, web browsing)
High school mathematics (algebra, basic statistics)
No prior programming experience required
Laptop with internet access (Python will be installed on Day 1)

Learning Objectives

By the end of this workshop, participants will be able to:

Write Python scripts and use Jupyter notebooks for data analysis
Manipulate and clean datasets using Pandas
Create informative visualizations with Matplotlib and Seaborn
Perform exploratory data analysis (EDA) on real-world datasets
Build, evaluate, and interpret basic ML models with scikit-learn

Software Requirements

Python 3.10+
Jupyter Notebook / JupyterLab
Libraries: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn

Recommended setup: Anaconda Distribution (includes everything)

Day-by-Day Program

Day 1: Python Fundamentals

Objectives: Install Python, understand core syntax, write first programs.

Time	Topic
09:00–10:30	Setup & First Steps — Installing Anaconda, launching Jupyter, cells & execution, Markdown basics
10:30–10:45	Break
10:45–12:30	Core Syntax — Variables, types (int, float, str, bool), operators, string formatting, input/output
12:30–14:00	Lunch
14:00–15:30	Control Flow — Conditionals (if/elif/else), loops (for, while), range(), list comprehensions
15:30–15:45	Break
15:45–17:00	Functions & Modules — Defining functions, parameters, return values, importing modules, `math`, `random`

Lab 1: Write a program that analyzes student grades — compute mean, median, min/max, and assign letter grades.

Homework: Create a number-guessing game using loops and conditionals.

Day 2: Data Structures & NumPy

Objectives: Master Python collections and numerical computing with NumPy.

Time	Topic
09:00–09:30	Homework Review — Discussion and Q&A
09:30–10:30	Data Structures — Lists, tuples, dictionaries, sets, nesting, common methods
10:30–10:45	Break
10:45–12:30	File I/O & Error Handling — Reading/writing CSV and text files, try/except, with statements
12:30–14:00	Lunch
14:00–15:30	NumPy Fundamentals — Arrays, shapes, dtypes, indexing, slicing, broadcasting
15:30–15:45	Break
15:45–17:00	NumPy Operations — Vectorized operations, aggregations, linear algebra basics, random number generation

Lab 2: Load a CSV file of weather data manually, then redo it with NumPy. Compare performance and code readability.

Homework: Use NumPy to simulate 10,000 dice rolls and plot the distribution of sums.

Day 3: Pandas — Data Wrangling

Objectives: Load, clean, transform, and explore datasets with Pandas.

Time	Topic
09:00–09:30	Homework Review
09:30–10:30	Pandas Basics — Series, DataFrame, read_csv, head/tail/info/describe, dtypes
10:30–10:45	Break
10:45–12:30	Selection & Filtering — loc/iloc, boolean indexing, query(), column operations, sorting
12:30–14:00	Lunch
14:00–15:30	Data Cleaning — Missing values (isna, fillna, dropna), duplicates, type conversion, string methods
15:30–15:45	Break
15:45–17:00	Aggregation & Grouping — groupby, agg, pivot_table, merge/join, concat

Lab 3: Clean and analyze a messy real-world dataset (e.g., World Bank development indicators for African countries). Handle missing values, merge multiple files, and produce summary statistics by country/year.

Homework: Prepare a cleaned dataset and write 5 analytical questions you want to answer with visualization.

Day 4: Data Visualization

Objectives: Create publication-quality plots and perform exploratory data analysis.

Time	Topic
09:00–09:30	Homework Review
09:30–10:30	Matplotlib Fundamentals — figure/axes model, plot(), scatter(), bar(), hist(), customization
10:30–10:45	Break
10:45–12:30	Seaborn for Statistical Visualization — distplot, boxplot, heatmap, pairplot, catplot, styling
12:30–14:00	Lunch
14:00–15:30	Exploratory Data Analysis (EDA) — Systematic approach: distributions, correlations, outliers, patterns. EDA workflow checklist
15:30–15:45	Break
15:45–17:00	Advanced Plots & Storytelling — Subplots, annotations, color palettes, saving figures, dashboard-style layouts

Lab 4: Perform a complete EDA on the cleaned dataset from Day 3. Answer the 5 questions with appropriate visualizations. Create a mini-report with narrative and figures.

Homework: Find a dataset relevant to your work/interests and prepare it for Day 5’s ML session.

Day 5: Introduction to Machine Learning

Objectives: Build, evaluate, and interpret first ML models with scikit-learn.

Time	Topic
09:00–09:30	Homework Presentations — Share EDA findings
09:30–10:30	ML Concepts — Supervised vs. unsupervised learning, train/test split, overfitting, bias-variance tradeoff
10:30–10:45	Break
10:45–12:30	Classification — Logistic regression, decision trees, random forests. scikit-learn API: fit/predict/score
12:30–14:00	Lunch
14:00–15:30	Regression & Evaluation — Linear regression, metrics (MSE, R², accuracy, precision, recall, F1), cross-validation
15:30–15:45	Break
15:45–16:30	Unsupervised Learning — K-Means clustering, PCA for dimensionality reduction, visualization of clusters
16:30–17:00	Wrap-Up & Next Steps — Recap, resources for continued learning, Q&A, certificates

Lab 5 (Capstone): End-to-end mini-project: load a dataset, clean it, explore it, build a predictive model, evaluate it, and present results. Participants choose from:

Predicting crop yields from climate data
Customer churn classification
Housing price regression

Assessment

Daily labs (50%) — Completion and quality of hands-on exercises
Capstone project (30%) — End-to-end analysis on Day 5
Participation (20%) — Engagement in discussions and homework

Resources

Python Documentation
Pandas Documentation
scikit-learn User Guide
Kaggle Datasets
The Shape of Data — Geometry-based ML and data analysis

Certificate

Participants who complete all labs and the capstone project receive a certificate of completion from the instructor.