SAE Feature Browser
Sparse Autoencoder — discovered features from model activations
Mechanistic Interpretability via Dictionary Learning.
Features below were extracted by training a Sparse Autoencoder on the model's internal activation vectors.
Each feature represents a learned direction in activation space — a concept the model uses internally.
Click any feature to see which inputs activate it most strongly.