News

The stacked sparse autoencoder is a powerful deep learning architecture composed of multiple autoencoder layers, with each layer responsible for extracting features at different levels.
A new research by Google DeepMind shows how sparse autoencoders (SAEs) with special JumpReLU activation can help interpret LLMs.
Anthropic opened a window into the ‘black box’ where ‘features’ steer a large language model’s output. OpenAI dug into the same concept two weeks later with a deep dive into sparse ...
To find features—or categories of data that represent a larger concept—in its AI model, Gemma, DeepMind ran a tool known as a “sparse autoencoder” on each of its layers.
They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called “sparse autoencoder”.