Lab 4 — Statistical tests on persistence diagrams¶
Goal. Two-sample test on diagrams: simulate two populations differing in H_1 noise level, test for a difference using a sliced-Wasserstein kernel and a permutation test.
What you ship. Notebook reporting a p-value, a power curve as the noise difference shrinks, and a 200-word memo on what the test does and does not tell you.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install ripser persim scikit-learn matplotlib numpy
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from ripser import ripser
from persim import sliced_wasserstein
from scipy.stats import permutation_test
rng = np.random.default_rng(42)
Two populations of 50 noisy circles each¶
In [ ]:
def population(n_clouds=50, n_points=200, noise=0.05, seed_base=0):
diagrams = []
for i in range(n_clouds):
r = np.random.default_rng(seed_base + i)
theta = r.uniform(0, 2*np.pi, n_points)
X = np.column_stack([np.cos(theta), np.sin(theta)])
X += noise * r.normal(size=X.shape)
diagrams.append(ripser(X, maxdim=1)['dgms'][1])
return diagrams
pop_A = population(noise=0.05, seed_base=0)
pop_B = population(noise=0.10, seed_base=1000)
print('populations built:', len(pop_A), len(pop_B))
Exercise 1 — Compute pairwise sliced-Wasserstein distance matrix¶
In [ ]:
# YOUR TURN
# Compute the (100, 100) pairwise distance matrix using persim.sliced_wasserstein.
Exercise 2 — Permutation test¶
In [ ]:
# YOUR TURN
# Test whether the mean within-A distance differs from the mean between-AB distance.
# Use scipy.stats.permutation_test with n_resamples=1000.
Exercise 3 — Power curve¶
In [ ]:
# YOUR TURN
# For noise difference in {0.02, 0.04, 0.06, 0.08, 0.10}, estimate the power
# of the test at alpha=0.05 over 50 replications each.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.