Week 01 — Foundations of Language Models
What a language model is, what it isn't, and why ChatGPT was the convergence of three decade-long research programs.
Week 01 — Foundations of Language Models
What a language model is, what it isn't, and why ChatGPT was the convergence of three decade-long research programs.
Lecture
From $n$-gram models to neural LMs · the language-modeling objective (next-token prediction) · perplexity · the convergence of the transformer + scale + RLHF · scaling laws (Kaplan 2020, Hoffmann 2022) · what ‘capability’ and ‘alignment’ mean.
Read before the lecture
- Kaplan et al., *Scaling Laws for Neural Language Models* (2020)
- Hoffmann et al., *Training Compute-Optimal Large Language Models* (NeurIPS 2022, the Chinchilla paper)
Reference text for this week: chapter 01 of the bilingual notes — EN PDF · FR PDF.