Info
This channel appears in
Anthropic
@AnthropicAI
Last year, we conjectured that polysemanticity is caused by "superposition" – models compressing many rare concepts into a small number of neurons. We also conjectured that "dictionary learning" might be able to undo superposition.
https://x.com/AnthropicAI/status/1570087876053942272
Neural networks often pack many unrelated concepts into a single neuron – a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. In our latest work, we build toy models where the origins of polysemanticity can be fully understood.
– @AnthropicAI
Oct 5, 2023, 5:40:34 PM
Tweet by @AnthropicAI