Week 05 — Data Pipelines and Feature Stores

Orchestration: the chain of transformations from raw data to model-ready features, running on a schedule, observable, idempotent.

MLOps  ·  schedule  ·  Week 05 of 12 ·  ← 04 ·  06 →

Week 05 — Data Pipelines and Feature Stores

Orchestration: the chain of transformations from raw data to model-ready features, running on a schedule, observable, idempotent.

Lecture

DAGs for data pipelines · Airflow (Beauchemin 2014) · Prefect, Dagster, Metaflow, Kubeflow Pipelines · feature stores (Feast, Hopsworks) · idempotence, backfill, late-arriving data · the training-serving feature parity problem.

Read before the lecture

Recitation — paper discussion

Hermann and Del Balso, *Meet Michelangelo: Uber's Machine Learning Platform* (Uber engineering 2017) (paper)

Come ready to argue one side of each:

  • What does Michelangelo solve that a notebook + Airflow doesn't?
  • What's the smallest team for which Michelangelo's complexity makes sense?

Reference text for this week: chapter 05 of the bilingual notes — EN PDF · FR PDF.