Jaehyuk Lim


2024

pdf
Accelerating Sparse Autoencoder Training via Layer-Wise Transfer Learning in Large Language Models
Davide Ghilardi | Federico Belotti | Marco Molinari | Jaehyuk Lim
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Sparse AutoEncoders (SAEs) have gained popularity as a tool for enhancing the interpretability of Large Language Models (LLMs). However, training SAEs can be computationally intensive, especially as model complexity grows. In this study, the potential of transfer learning to accelerate SAEs training is explored by capitalizing on the shared representations found across adjacent layers of LLMs. Our experimental results demonstrate that fine-tuning SAEs using pre-trained models from nearby layers not only maintains but often improves the quality of learned representations, while significantly accelerating convergence. These findings indicate that the strategic reuse of pretrained SAEs is a promising approach, particularly in settings where computational resources are constrained.

pdf
Language Models Don’t Learn the Physical Manifestation of Language
Bruce Lee | Jaehyuk Lim
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We argue that language-only models don’t learn the physical manifestation of language. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test.These tasks highlight a fundamental gap between human linguistic understanding and the sensory-deprived linguistic understanding of LLMs. In support of our hypothesis, 1. deliberate reasoning (Chain-of-Thought), 2. few-shot examples, or 3. stronger LLM from the same model family (LLaMA 2 13B -> LLaMA 2 70B) has no significant effect on H-Test performance. We bring in the philosophical case of Mary, who learns about the world in a sensory-deprived environment as a useful conceptual framework to understand how language-only models learn about the world (Jackson, 1986). Our experiments show that some of the strongest proprietary LLMs stay near random chance baseline accuracy of 50%, highlighting the limitations of linguistic knowledge acquired in the absence of sensory experience. Our code and data are available at <github.com/brucewlee/h-test>.