Lab 1 — Your first GNN: GCN on Cora¶
Goal. Train a 2-layer GCN on the Cora citation network and compare against a logistic-regression baseline on the same features.
What you ship. Notebook with training curves, final test accuracy for both models, and a 200-word memo on what the GCN gains over the bag-of-features baseline.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install torch torch_geometric scikit-learn matplotlib
In [ ]:
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np
torch.manual_seed(42)
np.random.seed(42)
The Cora citation network¶
In [ ]:
dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]
print('Nodes:', data.num_nodes, 'Edges:', data.num_edges, 'Features:', data.num_features, 'Classes:', dataset.num_classes)
Exercise 1 — Logistic-regression baseline on node features¶
In [ ]:
X = data.x.numpy()
y = data.y.numpy()
train_mask = data.train_mask.numpy()
test_mask = data.test_mask.numpy()
clf = LogisticRegression(max_iter=2000)
clf.fit(X[train_mask], y[train_mask])
preds = clf.predict(X[test_mask])
print('Logistic baseline accuracy:', accuracy_score(y[test_mask], preds))
Exercise 2 — Implement a 2-layer GCN¶
In [ ]:
# YOUR TURN
# Define a GCN with two GCNConv layers, a ReLU between them, and dropout.
# class GCN(torch.nn.Module): ...
Exercise 3 — Train and report test accuracy¶
In [ ]:
# YOUR TURN
# Train for 200 epochs with Adam(lr=0.01). Track val accuracy.
# Report final test accuracy. Compare to the logistic baseline.
Exercise 4 — Inspect failure cases¶
In [ ]:
# YOUR TURN
# Find 5 test-set nodes the GCN gets wrong. Look at their neighborhoods.
# Hypothesize why and write a short note.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.