Lab 3 — LoRA fine-tune of a small open model¶

Goal. Fine-tune Mistral-7B or Llama-3.1-8B with LoRA on a domain dataset. Compare zero-shot vs fine-tuned on a 50-example task-specific eval.

What you ship. Notebook with LoRA adapter weights, training loss curve, and a side-by-side eval table on a held-out 50-example test set.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install torch transformers peft accelerate bitsandbytes datasets
In [ ]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
from trl import SFTTrainer

torch.manual_seed(42)

Pick a domain dataset¶

The default below is a small medical-Q&A dataset; substitute your own if you have one.

In [ ]:
DATASET = 'medalpaca/medical_meadow_medqa'
BASE_MODEL = 'mistralai/Mistral-7B-Instruct-v0.2'
ds = load_dataset(DATASET, split='train').select(range(1000))
test = load_dataset(DATASET, split='train').select(range(1000, 1050))
print('train:', len(ds), 'test:', len(test))

Exercise 1 — Zero-shot baseline¶

In [ ]:
# YOUR TURN
# Load BASE_MODEL in 4-bit. Run zero-shot on the 50-example test set.
# Record exact-match accuracy.

Exercise 2 — LoRA fine-tune¶

In [ ]:
# YOUR TURN
# Configure LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj','v_proj']).
# Train with SFTTrainer for 200-500 steps on the 1000-example train set.

Exercise 3 — Side-by-side eval¶

In [ ]:
# YOUR TURN
# Re-run on the same 50-example test set with the fine-tuned model.
# Print a table: base accuracy vs fine-tuned accuracy.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.