Lab 2 — Sampling-strategy analysis¶

Goal. On a 1B-parameter open model, generate the same prompt with five sampling strategies. Quantify diversity, coherence, factuality with simple metrics.

What you ship. Notebook with 5 sampling strategies × 10 samples each, plus a small evaluation table on three metrics.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install torch transformers accelerate matplotlib
In [ ]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
import matplotlib.pyplot as plt

torch.manual_seed(42)
device = 'cuda' if torch.cuda.is_available() else 'cpu'

Load a small open model¶

In [ ]:
MODEL = 'EleutherAI/pythia-1b'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL).to(device).eval()
print('parameters:', sum(p.numel() for p in model.parameters()) / 1e9, 'B')

Exercise 1 — Generate under each sampling strategy¶

In [ ]:
PROMPT = 'The capital of Cameroon is'
STRATEGIES = {
  'greedy':    {'do_sample': False},
  'temp_0.5':  {'do_sample': True, 'temperature': 0.5},
  'temp_1.0':  {'do_sample': True, 'temperature': 1.0},
  'top_k_50':  {'do_sample': True, 'top_k': 50},
  'top_p_0.9': {'do_sample': True, 'top_p': 0.9},
}

# YOUR TURN — generate 10 continuations per strategy at max_new_tokens=80.

Exercise 2 — Quantify diversity¶

In [ ]:
# YOUR TURN
# For each strategy, compute distinct-2 (fraction of unique bigrams across samples).

Exercise 3 — Quantify coherence and factuality¶

In [ ]:
# YOUR TURN
# Coherence: average per-token perplexity of each sample under the same model.
# Factuality: did the sample correctly say 'Yaoundé'? Manual or string match.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.