MLOps · schedule · Week 08 of 12 · ← 07 · 09 →

Week 08 — Model Deployment — REST APIs, FastAPI, Streamlit

The model is in production when an HTTP endpoint serves it. The patterns from prototype demos to high-traffic inference services.

Lecture

REST API patterns for ML (FastAPI, BentoML) · UI overlays (Streamlit, Gradio, HuggingFace Spaces) · Triton Inference Server for high-throughput serving · KServe and Seldon for Kubernetes · the canary / blue-green / shadow-traffic deployment patterns · SLAs (p99 latency, throughput, availability).

Read before the lecture

FastAPI documentation

Code lab

Lab 5 — Deploy a model end-to-end

Wrap the model from Lab 3 in FastAPI. Containerize. Deploy to Render or Railway free tier. Verify the endpoint from a fresh notebook. Document the deployment in a 1-page README.

Notebook: lab05-deployment.ipynb · Dataset: Same model as Lab 3.

Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.