Week 08 — Model Deployment — REST APIs, FastAPI, Streamlit
The model is in production when an HTTP endpoint serves it. The patterns from prototype demos to high-traffic inference services.
Week 08 — Model Deployment — REST APIs, FastAPI, Streamlit
The model is in production when an HTTP endpoint serves it. The patterns from prototype demos to high-traffic inference services.
Lecture
REST API patterns for ML (FastAPI, BentoML) · UI overlays (Streamlit, Gradio, HuggingFace Spaces) · Triton Inference Server for high-throughput serving · KServe and Seldon for Kubernetes · the canary / blue-green / shadow-traffic deployment patterns · SLAs (p99 latency, throughput, availability).
Read before the lecture
Code lab
Lab 5 — Deploy a model end-to-end
Wrap the model from Lab 3 in FastAPI. Containerize. Deploy to Render or Railway free tier. Verify the endpoint from a fresh notebook. Document the deployment in a 1-page README.
Notebook: lab05-deployment.ipynb · Dataset: Same model as Lab 3.
Reference text for this week: chapter 08 of the bilingual notes — EN PDF · FR PDF.