Neel Nanda Mechanistic Interpretability Superposition

Media Summary: This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? This is a talk I gave to my MATS scholars, with a stylised history of the field of How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to

Neel Nanda Mechanistic Interpretability Superposition - Detailed Analysis & Overview

This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? This is a talk I gave to my MATS scholars, with a stylised history of the field of How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to Art by Clipped from episode 19 of AXRP: Transcript of that episode: ... Part 1 of a walkthrough of our paper, Progress Measures for Grokking via A walkthrough of Anthropic's Paper, A Toy Model of

A talk I gave to my MATS 9.0 training program about reasoning model Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at ... See part 2 here: Implementing GPT-2 from Scratch Template notebook: ...