A framework for analyzing concept representations in neural models

Burin Naowarat, Hao Tang, Sharon Goldwater


Abstract
Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We introduce a unified framework to study these subspaces along two axes: containment, which tests if a concept is fully represented in a subspace but not outside it, and disentanglement, which tests for isolation from other concepts. In experiments on both text and speech models, we first highlight that concept subspaces may not be uniquely determined, and discuss the implications for concept subspace analysis. Then, we compare properties of concept subspaces estimated using five estimators, proposed in different communities. We find that (1) the choice of estimator impacts the containment and disentanglement properties; (2) the state-of-the-art concept erasure method, LEACE, performs well on both testing axes, but still struggles to generalize to unseen data; and (3) in HuBERT speech representations, phone information is both contained and disentangled from speaker information, while speaker information is hard to contain in a compact subspace, despite being disentangled from phones.
Anthology ID:
2026.conll-main.34
Volume:
Proceedings of the 30th Conference on Computational Natural Language Learning
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Claire Bonial, Yevgeni Berzak
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
574–587
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.34/
DOI:
Bibkey:
Cite (ACL):
Burin Naowarat, Hao Tang, and Sharon Goldwater. 2026. A framework for analyzing concept representations in neural models. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 574–587, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
A framework for analyzing concept representations in neural models (Naowarat et al., CoNLL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.34.pdf