Media Summary: Been Kim (Google Brain) Emerging Challenges in Deep Learning. A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ... Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of 0:59 ...

How To Fail Interpretability Research - Detailed Analysis & Overview

Been Kim (Google Brain) Emerging Challenges in Deep Learning. A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ... Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of 0:59 ... Been Kim (Google Brain) Frontiers of Deep Learning. This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed? Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...

A talk I gave to my MATS 9.0 training program about reasoning model MIT 6.S897 Machine Learning for Healthcare, Spring 2019 Instructor: Peter Szolovits View the complete course: ... ... simple activation steering proved more effective than complex methods, Nanda argues for grounding ... and discuss the technical challenges they encountered in scaling our Lex Fridman Podcast full episode: Thank you for listening ❤ Check out our ... Check out Gradient now and redeem your free 5$ credits! Solving AI Doomerism: ...

MIT 6.874 Lecture 5. Spring 2020 Course website: Lecture slides: ...

Photo Gallery

How to Fail Interpretability Research
What is interpretability?
Assessing skeptical views of interpretability research
Interpretability: Understanding how AI models think
A Roadmap for the Rigorous Science of Interpretability | Finale Doshi-Velez | Talks at Google
Interpretability - now what?
What Matters Right Now In Mechanistic Interpretability?
The Dark Matter of AI [Mechanistic Interpretability]
How Reasoning Models Break Mechanistic Interpretability Techniques
Challenging common interpretability assumptions in feature attribution explanations
25. Interpretability
Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]
Sponsored
Sponsored
View Detailed Profile
How to Fail Interpretability Research

How to Fail Interpretability Research

Been Kim (Google Brain) https://simons.berkeley.edu/talks/tba-90 Emerging Challenges in Deep Learning.

What is interpretability?

What is interpretability?

A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ...

Sponsored
Assessing skeptical views of interpretability research

Assessing skeptical views of interpretability research

Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of https://web.stanford.edu/~cgpotts/blog/interp/ 0:59 ...

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think

Read more about Anthropic's

A Roadmap for the Rigorous Science of Interpretability | Finale Doshi-Velez | Talks at Google

A Roadmap for the Rigorous Science of Interpretability | Finale Doshi-Velez | Talks at Google

With a growing interest in

Sponsored
Interpretability - now what?

Interpretability - now what?

Been Kim (Google Brain) https://simons.berkeley.edu/talks/tbd-72 Frontiers of Deep Learning.

What Matters Right Now In Mechanistic Interpretability?

What Matters Right Now In Mechanistic Interpretability?

This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?

The Dark Matter of AI [Mechanistic Interpretability]

The Dark Matter of AI [Mechanistic Interpretability]

Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...

How Reasoning Models Break Mechanistic Interpretability Techniques

How Reasoning Models Break Mechanistic Interpretability Techniques

A talk I gave to my MATS 9.0 training program about reasoning model

Challenging common interpretability assumptions in feature attribution explanations

Challenging common interpretability assumptions in feature attribution explanations

Paper https://arxiv.org/abs/2012.02748 Code https://git.sr.ht/~hyphaebeast/challenging-xai Demo ...

25. Interpretability

25. Interpretability

MIT 6.S897 Machine Learning for Healthcare, Spring 2019 Instructor: Peter Szolovits View the complete course: ...

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

Neel Nanda - Our Pivot To Pragmatic Interpretability [Alignment Workshop]

... simple activation steering proved more effective than complex methods, Nanda argues for grounding

Scaling interpretability

Scaling interpretability

... and discuss the technical challenges they encountered in scaling our

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=ugvHCXCOmm4 Thank you for listening ❤ Check out our ...

Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]

Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]

Check out Gradient now and redeem your free 5$ credits! https://gradient.1stcollab.com/bycloud Solving AI Doomerism: ...

MIT Deep Learning Genomics - Lecture 5 - Model Interpretability (Spring 2020)

MIT Deep Learning Genomics - Lecture 5 - Model Interpretability (Spring 2020)

MIT 6.874 Lecture 5. Spring 2020 Course website: https://mit6874.github.io/ Lecture slides: ...

Part 2: 5. Interpretability

Part 2: 5. Interpretability

Neel Nanda discusses mechanistic

Interpretable vs Explainable Machine Learning

Interpretable vs Explainable Machine Learning

Interpretable

Explaining in the Dark: Perils of Interpretability Without Training Data (Sameer Singh)

Explaining in the Dark: Perils of Interpretability Without Training Data (Sameer Singh)

Explaining in the Dark: Perils of

Related Video Content

FAIL Definition & Meaning - Merriam-Webster information

3 days ago · The Broncos faltered in the snow, the Nuggets floundered on the hardwood, and now the Avalanche fail on...

Best Fails of the Year | Try Not to Laugh! - YouTube information

Nov 28, 2025 · No days off for fails 😂 Submit your videos for the chance to be featured 🔗...

FAIL | English meaning - Cambridge Dictionary information

FAIL definition: 1. to not succeed in what you are trying to achieve or are expected to do: 2. if none of our plans…....

Ultimate Dash Cam Fails | FailArmy | Watch - MSN information

Sep 15, 2025 · FailArmy is the worldwide leader in funny fail videos and compilations. FailArmy releases fails of the...

Fail - definition of fail by The Free Dictionary information

1. A failing grade: The student received a fail on the final paper. 2. Informal Something that does not achieve the...