Week 07 — LLMs and generative AI
Module 7: transformer internals, prompt engineering, LoRA fine-tuning, RAG over a domain corpus, agents with tool use.
What's under the hood of GPT/Claude/LLaMA, what you can actually do with them, and where they fail.
What you ship this week
Pick one of three tracks: a RAG system over a domain corpus, a LoRA fine-tune of a 1-7B-parameter model on a domain task, or a multi-step tool-using agent.
| Due | Friday 18:00 (Africa/Lagos (UTC+1)) |
|---|---|
| Submission | Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week. |
| Rubric | Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code. |
Live sessions and labs
Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.
| Day | Time | Block | Recording |
|---|---|---|---|
| Mon | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Mon | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Tue | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Tue | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Wed | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Wed | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Thu | 09:00-12:00 | Live instruction + code-along | (post-session) |
| Thu | 14:00-16:00 | Independent lab work + TA office hours | (post-session) |
| Fri | 10:00-11:00 | Industry speaker | (post-session) |
| Fri | 11:30-12:30 | Lab review | (post-session) |
| Fri | 14:00-15:00 | Cohort retrospective | (post-session) |
Learning outcomes
By the end of the week, every participant will:
- Understand the transformer architecture as it appears in modern LLMs.
- Apply prompt engineering and structured-output techniques effectively.
- Fine-tune a small open-source LLM with LoRA/QLoRA on a domain dataset.
- Build a retrieval-augmented generation (RAG) system.
- Evaluate generation honestly: when an LLM is genuinely useful, when it is confabulating.
Topics covered
The transformer, attention mechanism, scaling laws · pretraining, fine-tuning, RLHF/DPO at survey level · prompt engineering, structured output, function calling · parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) · RAG: chunking, embedding, retrieval, reranking, generation · diffusion models (intuition + practical use) · evaluation: BLEU, ROUGE, LLM-as-judge, human eval, why all of these are partial · agentic systems and tool use · safety, alignment, hallucination, bias.
Labs
Lab 1 — RAG over a domain corpus
Index a set of WHO/AFRO health reports in ChromaDB. Build retrieval + reranking + grounded generation. Evaluate faithfulness, context relevance, and answer correctness on a 30-question held-out set.
Dataset: WHO/AFRO PDF reports (open access).
Lab 2 — LoRA fine-tune on a domain task
Fine-tune Llama 3.1 8B or Mistral 7B with LoRA on a domain dataset (medical Q&A, legal contract review, or African-language instruction following). Report perplexity and a 50-example task-specific accuracy.
Dataset: Choose one of three pre-curated splits.
Lab 3 — Multi-step tool-using agent
Build an agent with LangGraph or function-calling that combines search, calculator, and code execution. Stress-test on 20 multi-step queries and document failure modes.
Dataset: Synthetic + live web tools.
Readings
Mandatory
- Before Tuesday. Vaswani et al., *Attention Is All You Need*
- Before Wednesday. Brown et al., *Language Models are Few-Shot Learners* (the GPT-3 paper, sections 1-3)
- Before Thursday. Lewis et al., *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*
- Before Friday. Hu et al., *LoRA: Low-Rank Adaptation of Large Language Models*
Optional deepening
- Anthropic, *Building effective agents* (engineering blog 2024)
- Karpathy, *Let's build GPT: from scratch, in code, spelled out*