Week 03 — GPT and Text Generation
The GPT family from 2018's 117M-parameter GPT-1 to today's trillion-parameter frontier, plus the inference-time engineering that makes deployment feasible.
Week 03 — GPT and Text Generation
The GPT family from 2018's 117M-parameter GPT-1 to today's trillion-parameter frontier, plus the inference-time engineering that makes deployment feasible.
Lecture
The decoder-only transformer in detail · training objectives (causal LM, MLM, span infilling) · sampling strategies (temperature, top-$k$, top-$p$, beam search) · KV cache · speculative decoding · batched inference.
Read before the lecture
- Radford et al., *Improving Language Understanding by Generative Pre-Training* (OpenAI 2018, GPT-1)
- Brown et al., *Language Models are Few-Shot Learners* (NeurIPS 2020, GPT-3)
Code lab
Lab 2 — Sampling strategy analysis
On a 1B-parameter open model (e.g., Pythia-1B), generate the same prompt with five sampling strategies. Quantify diversity, coherence, factuality with simple metrics.
Notebook: lab02-sampling.ipynb · Dataset: —
Reference text for this week: chapter 03 of the bilingual notes — EN PDF · FR PDF.