Week 03 — Version Control — Git and DVC
Code is the easy part. Data and models version differently.
Week 03 — Version Control — Git and DVC
Code is the easy part. Data and models version differently.
Lecture
Git for ML projects (branches, hooks, submodules) · DVC for data and model versioning · Git LFS, Pachyderm, lakeFS · MLflow Model Registry · the GitOps workflow.
Read before the lecture
Code lab
Lab 2 — Versioning the full ML project
Take an existing ML notebook. Version the code in Git, the dataset in DVC, the trained model artifact in MLflow Model Registry. Tag a v1.0 release that reproduces from scratch.
Notebook: lab02-versioning.ipynb · Dataset: Any prior coursework dataset.
Reference text for this week: chapter 03 of the bilingual notes — EN PDF · FR PDF.