Lab 2 — Sampling-strategy analysis¶
Goal. On a 1B-parameter open model, generate the same prompt with five sampling strategies. Quantify diversity, coherence, factuality with simple metrics.
What you ship. Notebook with 5 sampling strategies × 10 samples each, plus a small evaluation table on three metrics.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install torch transformers accelerate matplotlib
In [ ]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(42)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Load a small open model¶
In [ ]:
MODEL = 'EleutherAI/pythia-1b'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL).to(device).eval()
print('parameters:', sum(p.numel() for p in model.parameters()) / 1e9, 'B')
Exercise 1 — Generate under each sampling strategy¶
In [ ]:
PROMPT = 'The capital of Cameroon is'
STRATEGIES = {
'greedy': {'do_sample': False},
'temp_0.5': {'do_sample': True, 'temperature': 0.5},
'temp_1.0': {'do_sample': True, 'temperature': 1.0},
'top_k_50': {'do_sample': True, 'top_k': 50},
'top_p_0.9': {'do_sample': True, 'top_p': 0.9},
}
# YOUR TURN — generate 10 continuations per strategy at max_new_tokens=80.
Exercise 2 — Quantify diversity¶
In [ ]:
# YOUR TURN
# For each strategy, compute distinct-2 (fraction of unique bigrams across samples).
Exercise 3 — Quantify coherence and factuality¶
In [ ]:
# YOUR TURN
# Coherence: average per-token perplexity of each sample under the same model.
# Factuality: did the sample correctly say 'Yaoundé'? Manual or string match.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.