Bryce Hepner


2025

pdf bib
Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
Jeffrey Olmo | Jared Wilson | Max Forsey | Bryce Hepner | Thomas Vincent Howe | David Wingate
Findings of the Association for Computational Linguistics: NAACL 2025

Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network’s internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs.To address this, we introduce Gradient SAEs (g-SAEs), which modify the k-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the k elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network.Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both representations, retrospectively, and actions, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.

2024

pdf bib
WhatIf: Leveraging Word Vectors for Small-Scale Data Augmentation
Alex Lyman | Bryce Hepner
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

We introduce WhatIf, a lightly supervised data augmentation technique that leverages word vectors to enhance training data for small-scale language models. Inspired by reading prediction strategies used in education, WhatIf creates new samples by substituting semantically similar words in the training data. We evaluate WhatIf on multiple datasets, demonstrating small but consistent improvements in downstream evaluation compared to baseline models. Finally, we compare WhatIf to other small-scale data augmentation techniques and find that it provides comparable quantitative results at a potential tradeoff to qualitative evaluation.