AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations

Yifei Yao; Hanrong Zhang; Mengnan Du

AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations

Abstract

Understanding the internal representations of large language models (LLMs) remains a central challenge for interpretability research. Sparse autoencoders (SAEs) offer a promising solution by decomposing activations into interpretable features, but existing approaches rely on fixed sparsity constraints that fail to account for input complexity. We propose AdaptiveK SAE (Adaptive Top K Sparse Autoencoders), a novel framework that dynamically adjusts sparsity levels based on the semantic complexity of each input. Leveraging linear probes, we demonstrate that context complexity is linearly encoded in LLM representations, and we use this signal to guide feature allocation during training. Experiments across ten language models demonstrate that this complexity-driven adaptation outperforms fixed-sparsity approaches on reconstruction fidelity, explained variance, cosine similarity and interpretability metrics while eliminating the burden of extensive hyperparameter tuning. Our code is available at: https://github.com/hiyukie/adaptiveK.

Anthology ID:: 2026.findings-acl.1187
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23702–23728
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1187/
DOI:
Bibkey:
Cite (ACL):: Yifei Yao, Hanrong Zhang, and Mengnan Du. 2026. AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23702–23728, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AdaptiveK: Complexity-Driven Sparse Autoencoders for Interpretable Language Model Representations (Yao et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1187.pdf
Checklist:: 2026.findings-acl.1187.checklist.pdf

PDF Cite Search Checklist Fix data