How To Fail Interpretability Research

Media Summary: Been Kim (Google Brain) Emerging Challenges in Deep Learning. A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ... Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of 0:59 ...

How To Fail Interpretability Research - Detailed Analysis & Overview

Been Kim (Google Brain) Emerging Challenges in Deep Learning. A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ... Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of 0:59 ... Been Kim (Google Brain) Frontiers of Deep Learning. This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...

A talk I gave to my MATS 9.0 training program about reasoning model MIT 6.S897 Machine Learning for Healthcare, Spring 2019 Instructor: Peter Szolovits View the complete course: ... ... simple activation steering proved more effective than complex methods, Nanda argues for grounding ... and discuss the technical challenges they encountered in scaling our Lex Fridman Podcast full episode: Thank you for listening ❤ Check out our ... Check out Gradient now and redeem your free 5$ credits! Solving AI Doomerism: ...

MIT 6.874 Lecture 5. Spring 2020 Course website: Lecture slides: ...