Let's start our musings on sparsity with Lex Friedman's one year old podcast with Anthropic people (Dario, Amanda Askell, and Chris Olah): lexfridman.com/dario-amodei-transcript (Nov 11, 2024).
It's all very interesting (especially if one wants to understand how these Anthropic people think, and why the company is doing what it is doing), but I am going to focus on what Chris Olah is saying.
> (4:18, 4 hours 18 minutes) We have these neural network architectures that we design and we have these loss objectives that we create. And the neural network architecture, it’s kind of like a scaffold that the circuits grow on. It starts off with some random things, and it grows, and it’s almost like the objective that we train for is this light. And so we create the scaffold that it grows on, and we create the light that it grows towards. But the thing that we actually create, it’s this almost biological entity or organism that we’re studying.
But the key parts related to sparsity are happening later in his interview:
> (4:42) there’s this amazing thing in mathematics called compressed sensing, and it’s actually this very surprising fact where if you have a high dimensional space and you project it into a low dimensional space, ordinarily you can’t go and sort of un-project it and get back your high dimensional vector, you threw information away. This is like you can’t invert a rectangular matrix. You can only invert square matrices. But it turns out that that’s actually not quite true. If I tell you that the high-dimensional vector was sparse, so it’s mostly zeros, then it turns out that you can often go and find back the high-dimensional vector with very high probability.
>
> So that’s a surprising fact, right? It says that you can have this high-dimensional vector space, and as long as things are sparse, you can project it down, you can have a lower-dimensional projection of it, and that works. So the superposition hypothesis is saying that that’s what’s going on in neural networks, for instance, that’s what’s going on in word embeddings.
My corrections of what look like defects in the transcript are in bold.
> (4:44) And similarly, when we’re talking about neurons, you can have many more concepts than you have neurons. So that’s at a high level, the superposition hypothesis. Now it has this even wilder implication, which is to go and say that neural networks, it may not just be the case that the representations are like this, but the computation may also be like this. The connections between all of them. And so in some sense, neural networks may be shadows of much larger sparser neural networks. And what we see are these projections. And the strongest version of superposition hypothesis would be to take that really seriously and sort of say there actually is in some sense this upstairs model where the neurons are really sparse and all interpretable, and the weights between them are these really sparse circuits. And that’s what we’re studying. And the thing that we’re observing is the shadow of evidence. We need to find the original object.
The paragraph above is the key paragraph. Chris has the following conjecture: our models are compressed, somewhat distorted, and very efficient representation of an unknown very big "ideal" ultra-sparse model (with its structure and nature being currently unknown to us).
> (4:45) [the process of learning] is finding how to fit it efficiently or something like this. The gradient descent is doing this and in fact, so this sort of says that gradient descent, it could just represent a dense neural network, but it sort of says that gradient descent is implicitly searching over the space of extremely sparse models that could be projected into this low-dimensional space. And this large body of work of people going and trying to study sparse neural networks where you go and you have… you could design neural networks where the edges are sparse and the activations are sparse.
>
> And my sense is that work has generally, it feels very principled, it makes so much sense, and yet that work hasn’t really panned out that well, is my impression broadly. And I think that a potential answer for that is that actually the neural network is already sparse in some sense. You were trying to go and do this. Gradient descent was actually behind the scenes going and searching more efficiently than you could through the space of sparse models and going and learning whatever sparse model was most efficient. And then figuring out how to fold it down nicely to go and run conveniently on your GPU, which does as nice dense matrix multiplies. And that you just can’t beat that.
So, he is also saying that that "ideal ultra-sparse model" would have been too inefficient on our present hardware, that there are deep reasons for all this beautiful sparsity research nor bearing much fruit.
So he is suggesting that his research allows us to take a bit of a look into that "ideal large ultra-sparse model", but that we can't use that large background "ideal model" to boost capability. We'll be revisiting this assumption later.
He also talks how it would be desirable to identify "functional organs" (either in our models, or in those "ideal big ultra-sparse models" which are projected onto our models, see the first comment to this post).
It's all very interesting (especially if one wants to understand how these Anthropic people think, and why the company is doing what it is doing), but I am going to focus on what Chris Olah is saying.
> (4:18, 4 hours 18 minutes) We have these neural network architectures that we design and we have these loss objectives that we create. And the neural network architecture, it’s kind of like a scaffold that the circuits grow on. It starts off with some random things, and it grows, and it’s almost like the objective that we train for is this light. And so we create the scaffold that it grows on, and we create the light that it grows towards. But the thing that we actually create, it’s this almost biological entity or organism that we’re studying.
But the key parts related to sparsity are happening later in his interview:
>
> So that’s a surprising fact, right? It says that you can have this high-dimensional vector space, and as long as things are sparse, you can project it down, you can have a lower-dimensional projection of it, and that works. So the superposition hypothesis is saying that that’s what’s going on in neural networks, for instance, that’s what’s going on in word embeddings.
My corrections of what look like defects in the transcript are in bold.
> (4:44) And similarly, when we’re talking about neurons, you can have many more concepts than you have neurons. So that’s at a high level, the superposition hypothesis. Now it has this even wilder implication, which is to go and say that neural networks, it may not just be the case that the representations are like this, but the computation may also be like this. The connections between all of them. And so in some sense, neural networks may be shadows of much larger sparser neural networks. And what we see are these projections. And the strongest version of superposition hypothesis would be to take that really seriously and sort of say there actually is in some sense this upstairs model where the neurons are really sparse and all interpretable, and the weights between them are these really sparse circuits. And that’s what we’re studying. And the thing that we’re observing is the shadow of evidence. We need to find the original object.
The paragraph above is the key paragraph. Chris has the following conjecture: our models are compressed, somewhat distorted, and very efficient representation of an unknown very big "ideal" ultra-sparse model (with its structure and nature being currently unknown to us).
> (4:45) [the process of learning] is finding how to fit it efficiently or something like this. The gradient descent is doing this and in fact, so this sort of says that gradient descent, it could just represent a dense neural network, but it sort of says that gradient descent is implicitly searching over the space of extremely sparse models that could be projected into this low-dimensional space. And this large body of work of people going and trying to study sparse neural networks where you go and you have… you could design neural networks where the edges are sparse and the activations are sparse.
>
> And my sense is that work has generally, it feels very principled, it makes so much sense, and yet that work hasn’t really panned out that well, is my impression broadly. And I think that a potential answer for that is that actually the neural network is already sparse in some sense. You were trying to go and do this. Gradient descent was actually behind the scenes going and searching more efficiently than you could through the space of sparse models and going and learning whatever sparse model was most efficient. And then figuring out how to fold it down nicely to go and run conveniently on your GPU, which does as nice dense matrix multiplies. And that you just can’t beat that.
So, he is also saying that that "ideal ultra-sparse model" would have been too inefficient on our present hardware, that there are deep reasons for all this beautiful sparsity research nor bearing much fruit.
So he is suggesting that his research allows us to take a bit of a look into that "ideal large ultra-sparse model", but that we can't use that large background "ideal model" to boost capability. We'll be revisiting this assumption later.
He also talks how it would be desirable to identify "functional organs" (either in our models, or in those "ideal big ultra-sparse models" which are projected onto our models, see the first comment to this post).
Tags:
(no subject)
Date: 2025-11-17 03:37 am (UTC)Chris Olah (05:08:29) Yeah, exactly. And I mean, if you think about science, right? A lot of scientific fields investigate things at many level of abstraction. In biology, you have molecular biology studying proteins and molecules and so on, and they have cellular biology, and then you have histology studying tissues, and then you have anatomy, and then you have zoology, and then you have ecology. And so you have many, many levels of abstraction or physics, maybe you have a physics of individual particles, and then statistical physics gives you thermodynamics and things like this. And so you often have different levels of abstraction.
(05:09:01) And I think that right now we have mechanistic interpretability, if it succeeds, is sort of like a microbiology of neural networks, but we want something more like anatomy. And a question you might ask is, “Why can’t you just go there directly?” And I think the answer is superstition, at least in significant part. It’s that it’s actually very hard to see this macroscopic structure without first sort of breaking down the microscopic structure in the right way and then studying how it connects together. But I’m hopeful that there is going to be something much larger than features and circuits and that we’re going to be able to have a story that involves much bigger things. And then you can sort of study in detail the parts you care about.