Lab 2 — Hyperbolic embedding of a taxonomy¶

Goal. Embed WordNet's mammal subtree in 2-D Euclidean and 2-D Poincaré-ball space. Compare nearest-neighbor structure.

What you ship. Two scatter plots (Euclidean vs Poincaré), a precision@10 table for retrieving parent nodes, and a 200-word memo on when hyperbolic helps and when it doesn't.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install torch nltk matplotlib scikit-learn
In [ ]:
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import nltk
from nltk.corpus import wordnet as wn

nltk.download('wordnet', quiet=True)
torch.manual_seed(42)

WordNet mammal subtree¶

In [ ]:
root = wn.synset('mammal.n.01')
def descendants(s, depth=4):
    out = {s.name(): 0}
    frontier = [(s, 0)]
    while frontier:
        node, d = frontier.pop()
        if d >= depth: continue
        for child in node.hyponyms():
            if child.name() not in out:
                out[child.name()] = d + 1
                frontier.append((child, d+1))
    return out

nodes = descendants(root)
name_to_idx = {n: i for i, n in enumerate(nodes)}
edges = []
for name in nodes:
    for child in wn.synset(name).hyponyms():
        if child.name() in name_to_idx:
            edges.append((name_to_idx[name], name_to_idx[child.name()]))
print(f'{len(nodes)} synsets, {len(edges)} hyponymy edges')

Exercise 1 — Euclidean baseline (random init + edge attraction)¶

In [ ]:
# YOUR TURN
# Initialize 2-D embeddings, optimize so connected pairs are close and
# non-connected pairs are far (use a simple contrastive or InfoNCE loss).

Exercise 2 — Poincaré embedding¶

In [ ]:
# YOUR TURN
# Implement Nickel-Kiela Poincaré embedding using the hyperbolic distance:
# d(u,v) = arccosh(1 + 2 * |u-v|^2 / ((1 - |u|^2)(1 - |v|^2)))
# Train with Riemannian SGD on the same edges.

Exercise 3 — Compare visually and quantitatively¶

In [ ]:
# YOUR TURN
# Plot both embeddings. For each non-root node, retrieve the 10 nearest
# neighbors and compute precision@10 against true ancestors. Print both
# numbers in a table.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.