Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25

Meng Lu; Catherine Chen; Carsten Eickhoff

Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25

Meng Lu, Catherine Chen, Carsten Eickhoff

Abstract

Mechanistic interpretation has greatly contributed to a more detailed understanding of generative language models, enabling significant progress in identifying structures that implement key behaviors through interactions between internal components. In contrast, interpretability in information retrieval (IR) remains relatively coarse-grained, and much is still unknown as to how IR models determine whether a document is relevant to a query. In this work, we address this gap by mechanistically analyzing how one commonly used model, a cross-encoder, estimates relevance. We find that the model extracts traditional relevance signals, such as term frequency and inverse document frequency, in early-to-middle layers. These concepts are then combined in later layers, similar to the well-known probabilistic ranking function, BM25. Overall, our analysis offers a more nuanced understanding of how IR models compute relevance. Isolating these components lays the groundwork for future interventions that could enhance transparency, mitigate safety risks, and improve scalability.

Anthology ID:: 2025.emnlp-main.1297
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25536–25558
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1297/
DOI:
Bibkey:
Cite (ACL):: Meng Lu, Catherine Chen, and Carsten Eickhoff. 2025. Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25536–25558, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25 (Lu et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1297.pdf
Checklist:: 2025.emnlp-main.1297.checklist.pdf

PDF Cite Search Checklist Fix data