@inproceedings{park-etal-2025-decoding,
    title = "Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval",
    author = "Park, Seongwan  and
      Kim, Taeklim  and
      Ko, Youngjoong",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1345/",
    pages = "26479--26496",
    ISBN = "979-8-89176-332-6",
    abstract = "Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lackof interpretability. In this work, we propose a novel interpretability framework that leveragesSparse Autoencoders (SAEs) to decompose previously uninterpretable dense embeddings fromDPR models into distinct, interpretable latent concepts. We generate natural language descriptionsfor each latent concept, enabling human interpretations of both the dense embeddingsand the query-document similarity scores of DPR models. We further introduce Concept-Level Sparse Retrieval (CL-SR), a retrieval framework that directly utilizes the extractedlatent concepts as indexing units. CL-SR effectively combines the semantic expressiveness ofdense embeddings with the transparency and efficiency of sparse representations. We showthat CL-SR achieves high index-space and computational efficiency while maintaining robustperformance across vocabulary and semantic mismatches."
}Markdown (Informal)
[Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1345/) (Park et al., EMNLP 2025)
ACL