Mechanistic Interpretability Neel Nanda Deepmind

Media Summary: Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to We don't know how AIs think or why they do what they do. Or at least, we don't know much. That fact is only becoming more ...

Mechanistic Interpretability Neel Nanda Deepmind - Detailed Analysis & Overview

Visit our sponsor 80000 hours - grab their free career guide and check out their podcast! Use our ... How can we reverse engineer what a neural network is doing? In this IASEAI '25 session, An Introduction to We don't know how AIs think or why they do what they do. Or at least, we don't know much. That fact is only becoming more ... SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide ... This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? Art by Clipped from episode 19 of AXRP: Transcript of that episode: ...

This is a talk I gave to my MATS scholars, with a stylised history of the field of Part 1 of a walkthrough of our paper, Progress Measures for Grokking via A talk I gave to my MATS 9.0 training program about reasoning model When Anthropic tested Claude Sonnet 4.5 for alignment, the model appeared perfectly behaved — but it turned out the model had ... How can we look inside neural networks and figure out how they do what they do? This is likely to be very important for alignment ...