Lab 2 — SVM with a kernel sweep¶
Goal. Train SVMs with linear, polynomial, and RBF kernels. Tune C and gamma. Compare against logistic regression baseline. Discuss the kernel-trick trade-off.
What you ship. Notebook with 3 kernels × tuned hyperparameters, a comparison table, and a 200-word note on the computational cost.
Setup¶
Install the dependencies (one-time).
In [ ]:
# !pip install scikit-learn pandas matplotlib numpy
In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.datasets import fetch_openml
import time
Adult Census Income (UCI, public)¶
In [ ]:
adult = fetch_openml('adult', version=2, as_frame=True)
X = adult.data
y = (adult.target == '>50K').astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
print('train/test:', X_train.shape, X_test.shape)
Exercise 1 — Logistic regression baseline¶
In [ ]:
# YOUR TURN
# Build a preprocessing pipeline (one-hot for categoricals, scale for numerics).
# Fit logistic regression. Report accuracy and AUC.
Exercise 2 — SVM with three kernels¶
In [ ]:
# YOUR TURN
# For each kernel in {linear, poly, rbf}, grid-search over C and gamma.
# Train on a 20k random subsample of the train set (SVM scales poorly).
Exercise 3 — Compare time and accuracy¶
In [ ]:
# YOUR TURN
# Print: model, training time, AUC. Discuss the trade-off in 200 words.
Done?¶
Submit per the cohort schedule. Peer review pairing announced the following Monday.