Evaluating the Impact of SAE-based Language Steering on LLM Performance

Sebastian Zwirner; Wentao Hu; Koshiro Aoki; Daisuke Kawahara

Evaluating the Impact of SAE-based Language Steering on LLM Performance

Sebastian Zwirner, Wentao Hu, Koshiro Aoki, Daisuke Kawahara

Abstract

Recent advances in Sparse Autoencoders (SAEs) have revealed interpretable features within large language models (LLMs), including features that are specific to individual languages.In prior work, these features have been used to steer a model’s output language.However, the impact of SAE-based language steering on output quality and task performance, as well as its relationship to simpler prompting-based approaches, remains unclear.In this work, we study the effects of language steering using SAE features across multiple tasks and models.We apply language-specific SAE feature steering to three LLMs from two model families and evaluate it on a translation task and a multilingual question-answering task.We compare SAE-based steering against prompting and language neuron-based steering, and examine a combined prompting-and-steering approach.On the translation task, SAE feature steering achieves an average target-language accuracy of 92% across models and languages, consistently outperforming language neuron-based steering, but slightly underperforming prompting in language accuracy and output quality.In contrast, on the multilingual question-answering task, SAE-based steering enables stronger language control than prompting, and combining steering with prompting yields the best overall language control and task performance.These findings demonstrate the potential of SAE features as a tool for controllable multilingual generation.

Anthology ID:: 2026.eacl-srw.43
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 555–568
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.43/
DOI:
Bibkey:
Cite (ACL):: Sebastian Zwirner, Wentao Hu, Koshiro Aoki, and Daisuke Kawahara. 2026. Evaluating the Impact of SAE-based Language Steering on LLM Performance. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 555–568, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Evaluating the Impact of SAE-based Language Steering on LLM Performance (Zwirner et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.43.pdf

PDF Cite Search Fix data