Week 07 — LLMs and generative AI |

Module M7 | ← schedule | ← week 06 | week 08 →

What's under the hood of GPT/Claude/LLaMA, what you can actually do with them, and where they fail.

What you ship this week

Pick one of three tracks: a RAG system over a domain corpus, a LoRA fine-tune of a 1-7B-parameter model on a domain task, or a multi-step tool-using agent.

Due	Friday 18:00 (Africa/Lagos (UTC+1))
Submission	Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week.
Rubric	Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code.

Live sessions and labs

Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.

Day	Time	Block	Recording
Mon	`09:00-12:00`	Live instruction + code-along	(post-session)
Mon	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Tue	`09:00-12:00`	Live instruction + code-along	(post-session)
Tue	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Wed	`09:00-12:00`	Live instruction + code-along	(post-session)
Wed	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Thu	`09:00-12:00`	Live instruction + code-along	(post-session)
Thu	`14:00-16:00`	Independent lab work + TA office hours	(post-session)
Fri	`10:00-11:00`	Industry speaker	(post-session)
Fri	`11:30-12:30`	Lab review	(post-session)
Fri	`14:00-15:00`	Cohort retrospective	(post-session)

Learning outcomes

By the end of the week, every participant will:

Understand the transformer architecture as it appears in modern LLMs.
Apply prompt engineering and structured-output techniques effectively.
Fine-tune a small open-source LLM with LoRA/QLoRA on a domain dataset.
Build a retrieval-augmented generation (RAG) system.
Evaluate generation honestly: when an LLM is genuinely useful, when it is confabulating.

Topics covered

The transformer, attention mechanism, scaling laws · pretraining, fine-tuning, RLHF/DPO at survey level · prompt engineering, structured output, function calling · parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) · RAG: chunking, embedding, retrieval, reranking, generation · diffusion models (intuition + practical use) · evaluation: BLEU, ROUGE, LLM-as-judge, human eval, why all of these are partial · agentic systems and tool use · safety, alignment, hallucination, bias.

Labs

Lab 1 — RAG over a domain corpus

Index a set of WHO/AFRO health reports in ChromaDB. Build retrieval + reranking + grounded generation. Evaluate faithfulness, context relevance, and answer correctness on a 30-question held-out set.

Dataset: WHO/AFRO PDF reports (open access).

Lab 2 — LoRA fine-tune on a domain task

Fine-tune Llama 3.1 8B or Mistral 7B with LoRA on a domain dataset (medical Q&A, legal contract review, or African-language instruction following). Report perplexity and a 50-example task-specific accuracy.

Dataset: Choose one of three pre-curated splits.

Lab 3 — Multi-step tool-using agent

Build an agent with LangGraph or function-calling that combines search, calculator, and code execution. Stress-test on 20 multi-step queries and document failure modes.

Dataset: Synthetic + live web tools.

Readings

Mandatory

Before Tuesday. Vaswani et al., *Attention Is All You Need*
Before Wednesday. Brown et al., *Language Models are Few-Shot Learners* (the GPT-3 paper, sections 1-3)
Before Thursday. Lewis et al., *Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks*
Before Friday. Hu et al., *LoRA: Low-Rank Adaptation of Large Language Models*

Optional deepening

Builds on (course catalogue)

IA Générative