Lab 2 — Tabular Q-learning on FrozenLake and Taxi¶

Goal. Train a tabular Q-learning agent on FrozenLake-v1 and Taxi-v3. Report learning curves and final reward.

What you ship. Notebook with two trained Q-tables, learning curves, and a 200-word note on which hyperparameter mattered most.

Setup¶

Install the dependencies (one-time).

In [ ]:
# !pip install gymnasium numpy matplotlib
In [ ]:
import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

Environments¶

In [ ]:
env_lake = gym.make('FrozenLake-v1', is_slippery=True)
env_taxi = gym.make('Taxi-v3')
print('FrozenLake states/actions:', env_lake.observation_space.n, env_lake.action_space.n)
print('Taxi states/actions:', env_taxi.observation_space.n, env_taxi.action_space.n)

Exercise 1 — Implement tabular Q-learning¶

In [ ]:
# YOUR TURN
# Function q_learn(env, episodes, alpha, gamma, eps_start, eps_end, eps_decay)
# returning Q-table and per-episode reward.

Exercise 2 — Train on FrozenLake and plot the learning curve¶

In [ ]:
# YOUR TURN
# Train for 20_000 episodes. Plot 100-episode rolling mean of reward.
# Print the final greedy-policy success rate over 1000 evaluation episodes.

Exercise 3 — Train on Taxi and tune¶

In [ ]:
# YOUR TURN
# Train for 30_000 episodes. Tune alpha and eps_decay. Report final
# average return over 1000 evaluation episodes.

Done?¶

Submit per the cohort schedule. Peer review pairing announced the following Monday.