Lab 4 — Statistical tests on persistence diagrams¶

Goal. Two-sample test on diagrams: simulate two populations differing in H_1 noise level, test for a difference using a sliced-Wasserstein kernel and a permutation test.

What you ship. Notebook reporting a p-value, a power curve as the noise difference shrinks, and a 200-word memo on what the test does and does not tell you.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install ripser persim scikit-learn matplotlib numpy
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from ripser import ripser
from persim import sliced_wasserstein
from scipy.stats import permutation_test

rng = np.random.default_rng(42)

Two populations of 50 noisy circles each¶

In [ ]:
def population(n_clouds=50, n_points=200, noise=0.05, seed_base=0):
    diagrams = []
    for i in range(n_clouds):
        r = np.random.default_rng(seed_base + i)
        theta = r.uniform(0, 2*np.pi, n_points)
        X = np.column_stack([np.cos(theta), np.sin(theta)])
        X += noise * r.normal(size=X.shape)
        diagrams.append(ripser(X, maxdim=1)['dgms'][1])
    return diagrams

pop_A = population(noise=0.05, seed_base=0)
pop_B = population(noise=0.10, seed_base=1000)
print('populations built:', len(pop_A), len(pop_B))

Exercise 1 — Compute pairwise sliced-Wasserstein distance matrix¶

In [ ]:
# YOUR TURN
# Compute the (100, 100) pairwise distance matrix using persim.sliced_wasserstein.

Exercise 2 — Permutation test¶

In [ ]:
# YOUR TURN
# Test whether the mean within-A distance differs from the mean between-AB distance.
# Use scipy.stats.permutation_test with n_resamples=1000.

Exercise 3 — Power curve¶

In [ ]:
# YOUR TURN
# For noise difference in {0.02, 0.04, 0.06, 0.08, 0.10}, estimate the power
# of the test at alpha=0.05 over 50 replications each.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.