Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models

Ikhyun Cho; Julia Hockenmaier

Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models

Abstract

Sparse autoencoders (SAEs) have emerged as a powerful analytical tool in mechanistic interpretability for large language models (LLMs), with growing success in applications beyond interpretability. Building on this momentum, we present a novel approach that leverages SAEs to enhance the general in-context learning (ICL) performance of LLMs.Specifically, we introduce Feature Detection through Prompt Variation (FDPV), which leverages the SAE’s remarkable ability to capture subtle differences between prompts, enabling efficient feature selection for downstream steering. In addition, we propose a novel steering method tailored to ICL—Selective In-Context Steering (SISTER)—grounded in recent insights from ICL research that LLMs utilize label words as key anchors. Our method yields a 3.5% average performance improvement across diverse text classification tasks and exhibits greater robustness to hyperparameter variations compared to standard steering approaches. Our code is available at https://github.com/ihcho2/SAE-ICL.

Anthology ID:: 2025.emnlp-main.1474
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28949–28961
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1474/
DOI:
Bibkey:
Cite (ACL):: Ikhyun Cho and Julia Hockenmaier. 2025. Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28949–28961, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models (Cho & Hockenmaier, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1474.pdf
Checklist:: 2025.emnlp-main.1474.checklist.pdf

PDF Cite Search Checklist Fix data