Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Seongwan Park; Taeklim Kim; Youngjoong Ko

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Seongwan Park, Taeklim Kim, Youngjoong Ko

Abstract

Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lackof interpretability. In this work, we propose a novel interpretability framework that leveragesSparse Autoencoders (SAEs) to decompose previously uninterpretable dense embeddings fromDPR models into distinct, interpretable latent concepts. We generate natural language descriptionsfor each latent concept, enabling human interpretations of both the dense embeddingsand the query-document similarity scores of DPR models. We further introduce Concept-Level Sparse Retrieval (CL-SR), a retrieval framework that directly utilizes the extractedlatent concepts as indexing units. CL-SR effectively combines the semantic expressiveness ofdense embeddings with the transparency and efficiency of sparse representations. We showthat CL-SR achieves high index-space and computational efficiency while maintaining robustperformance across vocabulary and semantic mismatches.

Anthology ID:: 2025.emnlp-main.1345
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26479–26496
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1345/
DOI:
Bibkey:
Cite (ACL):: Seongwan Park, Taeklim Kim, and Youngjoong Ko. 2025. Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26479–26496, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval (Park et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1345.pdf
Checklist:: 2025.emnlp-main.1345.checklist.pdf

PDF Cite Search Checklist Fix data