Lab 2 — Vectorizing persistence diagrams¶

Goal. Take diagrams from Lab 1, compute persistence landscapes and persistence images, and use them as features for a sklearn classifier that distinguishes the three point-cloud types.

What you ship. Notebook with two classifiers (one on landscapes, one on images) and a confusion matrix for each. Plus a 200-word memo on which vectorization works best on this dataset and why.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install ripser persim scikit-learn matplotlib numpy
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from ripser import ripser
from persim import PersistenceImager, plot_diagrams
from persim.landscapes import PersLandscapeApprox, plot_landscape_simple
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

rng = np.random.default_rng(42)

Datasets — 90 point clouds, three classes (30 each)¶

In [ ]:
def noisy_circle(n=200, r=1.0, noise=0.05, seed=None):
    r_rng = np.random.default_rng(seed)
    theta = r_rng.uniform(0, 2*np.pi, n)
    pts = np.column_stack([r*np.cos(theta), r*np.sin(theta)])
    return pts + noise * r_rng.normal(size=pts.shape)

def two_circles(n=300, sep=2.5, noise=0.05, seed=None):
    r_rng = np.random.default_rng(seed)
    c1 = noisy_circle(n//2, noise=noise, seed=seed)
    c2 = noisy_circle(n//2, noise=noise, seed=seed+1 if seed else None) + np.array([sep, 0.0])
    return np.vstack([c1, c2])

def random_blob(n=200, noise=0.4, seed=None):
    r_rng = np.random.default_rng(seed)
    return r_rng.normal(scale=noise, size=(n, 2))

X_data, y_data = [], []
for i in range(30):
    X_data.append(noisy_circle(seed=i));      y_data.append(0)
    X_data.append(two_circles(seed=100+i));   y_data.append(1)
    X_data.append(random_blob(seed=200+i));   y_data.append(2)
y_data = np.array(y_data)
print('dataset:', len(X_data), 'point clouds,', np.bincount(y_data), 'per class')

Exercise 1 — Compute persistence diagrams for all 90 clouds¶

In [ ]:
diagrams = []
for X in X_data:
    res = ripser(X, maxdim=1)
    diagrams.append(res['dgms'])
print(f'{len(diagrams)} diagrams computed')

Exercise 2 — Vectorize using persistence images¶

In [ ]:
# YOUR TURN
# Use persim.PersistenceImager to vectorize the H_1 diagrams.
# Stack the resulting flat vectors into an (N, D) feature matrix X_img.

Exercise 3 — Train a logistic-regression classifier on each¶

In [ ]:
# YOUR TURN
# 1. Train/test split (75/25)
# 2. Fit logistic regression on the persistence-image features.
# 3. Print the classification_report and the confusion matrix.

Exercise 4 — Repeat for persistence landscapes and compare¶

In [ ]:
# YOUR TURN
# Vectorize via PersLandscapeApprox, train the same classifier,
# and report which vectorization performs better on this dataset.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.