Python for Data Science — 5-Day Workshop
5-day workshop: Pandas, visualization, ML with scikit-learn.
Instructor: Dr. Yaé Ulrich Gaba Duration: 5 days (30 hours) Level: Beginner to Intermediate Language: English
Overview
This hands-on workshop takes participants from zero Python experience to building their first machine learning models. Through daily coding labs, real-world datasets, and progressive projects, learners develop practical data science skills grounded in solid programming fundamentals.
Prerequisites
- Basic computer literacy (file management, web browsing)
- High school mathematics (algebra, basic statistics)
- No prior programming experience required
- Laptop with internet access (Python will be installed on Day 1)
Learning Objectives
By the end of this workshop, participants will be able to:
- Write Python scripts and use Jupyter notebooks for data analysis
- Manipulate and clean datasets using Pandas
- Create informative visualizations with Matplotlib and Seaborn
- Perform exploratory data analysis (EDA) on real-world datasets
- Build, evaluate, and interpret basic ML models with scikit-learn
Software Requirements
- Python 3.10+
- Jupyter Notebook / JupyterLab
- Libraries: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn
Recommended setup: Anaconda Distribution (includes everything)
Day-by-Day Program
Day 1: Python Fundamentals
Objectives: Install Python, understand core syntax, write first programs.
| Time | Topic |
|---|---|
| 09:00–10:30 | Setup & First Steps — Installing Anaconda, launching Jupyter, cells & execution, Markdown basics |
| 10:30–10:45 | Break |
| 10:45–12:30 | Core Syntax — Variables, types (int, float, str, bool), operators, string formatting, input/output |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Control Flow — Conditionals (if/elif/else), loops (for, while), range(), list comprehensions |
| 15:30–15:45 | Break |
| 15:45–17:00 | Functions & Modules — Defining functions, parameters, return values, importing modules, math, random |
Lab 1: Write a program that analyzes student grades — compute mean, median, min/max, and assign letter grades.
Homework: Create a number-guessing game using loops and conditionals.
Day 2: Data Structures & NumPy
Objectives: Master Python collections and numerical computing with NumPy.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review — Discussion and Q&A |
| 09:30–10:30 | Data Structures — Lists, tuples, dictionaries, sets, nesting, common methods |
| 10:30–10:45 | Break |
| 10:45–12:30 | File I/O & Error Handling — Reading/writing CSV and text files, try/except, with statements |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | NumPy Fundamentals — Arrays, shapes, dtypes, indexing, slicing, broadcasting |
| 15:30–15:45 | Break |
| 15:45–17:00 | NumPy Operations — Vectorized operations, aggregations, linear algebra basics, random number generation |
Lab 2: Load a CSV file of weather data manually, then redo it with NumPy. Compare performance and code readability.
Homework: Use NumPy to simulate 10,000 dice rolls and plot the distribution of sums.
Day 3: Pandas — Data Wrangling
Objectives: Load, clean, transform, and explore datasets with Pandas.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review |
| 09:30–10:30 | Pandas Basics — Series, DataFrame, read_csv, head/tail/info/describe, dtypes |
| 10:30–10:45 | Break |
| 10:45–12:30 | Selection & Filtering — loc/iloc, boolean indexing, query(), column operations, sorting |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Data Cleaning — Missing values (isna, fillna, dropna), duplicates, type conversion, string methods |
| 15:30–15:45 | Break |
| 15:45–17:00 | Aggregation & Grouping — groupby, agg, pivot_table, merge/join, concat |
Lab 3: Clean and analyze a messy real-world dataset (e.g., World Bank development indicators for African countries). Handle missing values, merge multiple files, and produce summary statistics by country/year.
Homework: Prepare a cleaned dataset and write 5 analytical questions you want to answer with visualization.
Day 4: Data Visualization
Objectives: Create publication-quality plots and perform exploratory data analysis.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Review |
| 09:30–10:30 | Matplotlib Fundamentals — figure/axes model, plot(), scatter(), bar(), hist(), customization |
| 10:30–10:45 | Break |
| 10:45–12:30 | Seaborn for Statistical Visualization — distplot, boxplot, heatmap, pairplot, catplot, styling |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Exploratory Data Analysis (EDA) — Systematic approach: distributions, correlations, outliers, patterns. EDA workflow checklist |
| 15:30–15:45 | Break |
| 15:45–17:00 | Advanced Plots & Storytelling — Subplots, annotations, color palettes, saving figures, dashboard-style layouts |
Lab 4: Perform a complete EDA on the cleaned dataset from Day 3. Answer the 5 questions with appropriate visualizations. Create a mini-report with narrative and figures.
Homework: Find a dataset relevant to your work/interests and prepare it for Day 5’s ML session.
Day 5: Introduction to Machine Learning
Objectives: Build, evaluate, and interpret first ML models with scikit-learn.
| Time | Topic |
|---|---|
| 09:00–09:30 | Homework Presentations — Share EDA findings |
| 09:30–10:30 | ML Concepts — Supervised vs. unsupervised learning, train/test split, overfitting, bias-variance tradeoff |
| 10:30–10:45 | Break |
| 10:45–12:30 | Classification — Logistic regression, decision trees, random forests. scikit-learn API: fit/predict/score |
| 12:30–14:00 | Lunch |
| 14:00–15:30 | Regression & Evaluation — Linear regression, metrics (MSE, R², accuracy, precision, recall, F1), cross-validation |
| 15:30–15:45 | Break |
| 15:45–16:30 | Unsupervised Learning — K-Means clustering, PCA for dimensionality reduction, visualization of clusters |
| 16:30–17:00 | Wrap-Up & Next Steps — Recap, resources for continued learning, Q&A, certificates |
Lab 5 (Capstone): End-to-end mini-project: load a dataset, clean it, explore it, build a predictive model, evaluate it, and present results. Participants choose from:
- Predicting crop yields from climate data
- Customer churn classification
- Housing price regression
Assessment
- Daily labs (50%) — Completion and quality of hands-on exercises
- Capstone project (30%) — End-to-end analysis on Day 5
- Participation (20%) — Engagement in discussions and homework
Resources
- Python Documentation
- Pandas Documentation
- scikit-learn User Guide
- Kaggle Datasets
- The Shape of Data — Geometry-based ML and data analysis
Certificate
Participants who complete all labs and the capstone project receive a certificate of completion from the instructor.