Week 07 — LLMs and generative AI

Module 7: transformer internals, prompt engineering, LoRA fine-tuning, RAG over a domain corpus, agents with tool use.

Module M7  |  ← schedule |  ← week 06 |  week 08 →

What's under the hood of GPT/Claude/LLaMA, what you can actually do with them, and where they fail.

What you ship this week

Pick one of three tracks: a RAG system over a domain corpus, a LoRA fine-tune of a 1-7B-parameter model on a domain task, or a multi-step tool-using agent.

Due Friday 18:00 (Africa/Lagos (UTC+1))
Submission Drop the repo URL into the week's cohort channel. Peer-review pairing announced Monday of next week.
Rubric Pass / revise. Pass requires green CI, tests covering the public API, and a README a stranger can follow to install and run the code.

Live sessions and labs

Default weekly cadence below. Cohort-specific dates and Zoom links fill in at intake.

Day Time Block Recording
Mon 09:00-12:00 Live instruction + code-along (post-session)
Mon 14:00-16:00 Independent lab work + TA office hours (post-session)
Tue 09:00-12:00 Live instruction + code-along (post-session)
Tue 14:00-16:00 Independent lab work + TA office hours (post-session)
Wed 09:00-12:00 Live instruction + code-along (post-session)
Wed 14:00-16:00 Independent lab work + TA office hours (post-session)
Thu 09:00-12:00 Live instruction + code-along (post-session)
Thu 14:00-16:00 Independent lab work + TA office hours (post-session)
Fri 10:00-11:00 Industry speaker (post-session)
Fri 11:30-12:30 Lab review (post-session)
Fri 14:00-15:00 Cohort retrospective (post-session)

Learning outcomes

By the end of the week, every participant will:

  1. Understand the transformer architecture as it appears in modern LLMs.
  2. Apply prompt engineering and structured-output techniques effectively.
  3. Fine-tune a small open-source LLM with LoRA/QLoRA on a domain dataset.
  4. Build a retrieval-augmented generation (RAG) system.
  5. Evaluate generation honestly: when an LLM is genuinely useful, when it is confabulating.

Topics covered

The transformer, attention mechanism, scaling laws · pretraining, fine-tuning, RLHF/DPO at survey level · prompt engineering, structured output, function calling · parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) · RAG: chunking, embedding, retrieval, reranking, generation · diffusion models (intuition + practical use) · evaluation: BLEU, ROUGE, LLM-as-judge, human eval, why all of these are partial · agentic systems and tool use · safety, alignment, hallucination, bias.

Labs

Lab 1 — RAG over a domain corpus

Index a set of WHO/AFRO health reports in ChromaDB. Build retrieval + reranking + grounded generation. Evaluate faithfulness, context relevance, and answer correctness on a 30-question held-out set.

Dataset: WHO/AFRO PDF reports (open access).

Lab 2 — LoRA fine-tune on a domain task

Fine-tune Llama 3.1 8B or Mistral 7B with LoRA on a domain dataset (medical Q&A, legal contract review, or African-language instruction following). Report perplexity and a 50-example task-specific accuracy.

Dataset: Choose one of three pre-curated splits.

Lab 3 — Multi-step tool-using agent

Build an agent with LangGraph or function-calling that combines search, calculator, and code execution. Stress-test on 20 multi-step queries and document failure modes.

Dataset: Synthetic + live web tools.

Readings

Mandatory

Optional deepening

Builds on (course catalogue)