Lab 2 — Hyperbolic embedding of a taxonomy¶
Goal. Embed WordNet's mammal subtree in 2-D Euclidean and 2-D Poincaré-ball space. Compare nearest-neighbor structure.
What you ship. Two scatter plots (Euclidean vs Poincaré), a precision@10 table for retrieving parent nodes, and a 200-word memo on when hyperbolic helps and when it doesn't.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install torch nltk matplotlib scikit-learn
In [ ]:
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet', quiet=True)
torch.manual_seed(42)
WordNet mammal subtree¶
In [ ]:
root = wn.synset('mammal.n.01')
def descendants(s, depth=4):
out = {s.name(): 0}
frontier = [(s, 0)]
while frontier:
node, d = frontier.pop()
if d >= depth: continue
for child in node.hyponyms():
if child.name() not in out:
out[child.name()] = d + 1
frontier.append((child, d+1))
return out
nodes = descendants(root)
name_to_idx = {n: i for i, n in enumerate(nodes)}
edges = []
for name in nodes:
for child in wn.synset(name).hyponyms():
if child.name() in name_to_idx:
edges.append((name_to_idx[name], name_to_idx[child.name()]))
print(f'{len(nodes)} synsets, {len(edges)} hyponymy edges')
Exercise 1 — Euclidean baseline (random init + edge attraction)¶
In [ ]:
# YOUR TURN
# Initialize 2-D embeddings, optimize so connected pairs are close and
# non-connected pairs are far (use a simple contrastive or InfoNCE loss).
Exercise 2 — Poincaré embedding¶
In [ ]:
# YOUR TURN
# Implement Nickel-Kiela Poincaré embedding using the hyperbolic distance:
# d(u,v) = arccosh(1 + 2 * |u-v|^2 / ((1 - |u|^2)(1 - |v|^2)))
# Train with Riemannian SGD on the same edges.
Exercise 3 — Compare visually and quantitatively¶
In [ ]:
# YOUR TURN
# Plot both embeddings. For each non-root node, retrieve the 10 nearest
# neighbors and compute precision@10 against true ancestors. Print both
# numbers in a table.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.