Lab 2 — Vectorizing persistence diagrams¶
Goal. Take diagrams from Lab 1, compute persistence landscapes and persistence images, and use them as features for a sklearn classifier that distinguishes the three point-cloud types.
What you ship. Notebook with two classifiers (one on landscapes, one on images) and a confusion matrix for each. Plus a 200-word memo on which vectorization works best on this dataset and why.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install ripser persim scikit-learn matplotlib numpy
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from ripser import ripser
from persim import PersistenceImager, plot_diagrams
from persim.landscapes import PersLandscapeApprox, plot_landscape_simple
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
rng = np.random.default_rng(42)
Datasets — 90 point clouds, three classes (30 each)¶
In [ ]:
def noisy_circle(n=200, r=1.0, noise=0.05, seed=None):
r_rng = np.random.default_rng(seed)
theta = r_rng.uniform(0, 2*np.pi, n)
pts = np.column_stack([r*np.cos(theta), r*np.sin(theta)])
return pts + noise * r_rng.normal(size=pts.shape)
def two_circles(n=300, sep=2.5, noise=0.05, seed=None):
r_rng = np.random.default_rng(seed)
c1 = noisy_circle(n//2, noise=noise, seed=seed)
c2 = noisy_circle(n//2, noise=noise, seed=seed+1 if seed else None) + np.array([sep, 0.0])
return np.vstack([c1, c2])
def random_blob(n=200, noise=0.4, seed=None):
r_rng = np.random.default_rng(seed)
return r_rng.normal(scale=noise, size=(n, 2))
X_data, y_data = [], []
for i in range(30):
X_data.append(noisy_circle(seed=i)); y_data.append(0)
X_data.append(two_circles(seed=100+i)); y_data.append(1)
X_data.append(random_blob(seed=200+i)); y_data.append(2)
y_data = np.array(y_data)
print('dataset:', len(X_data), 'point clouds,', np.bincount(y_data), 'per class')
Exercise 1 — Compute persistence diagrams for all 90 clouds¶
In [ ]:
diagrams = []
for X in X_data:
res = ripser(X, maxdim=1)
diagrams.append(res['dgms'])
print(f'{len(diagrams)} diagrams computed')
Exercise 2 — Vectorize using persistence images¶
In [ ]:
# YOUR TURN
# Use persim.PersistenceImager to vectorize the H_1 diagrams.
# Stack the resulting flat vectors into an (N, D) feature matrix X_img.
Exercise 3 — Train a logistic-regression classifier on each¶
In [ ]:
# YOUR TURN
# 1. Train/test split (75/25)
# 2. Fit logistic regression on the persistence-image features.
# 3. Print the classification_report and the confusion matrix.
Exercise 4 — Repeat for persistence landscapes and compare¶
In [ ]:
# YOUR TURN
# Vectorize via PersLandscapeApprox, train the same classifier,
# and report which vectorization performs better on this dataset.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.