Language Model Re-rankers are Fooled by Lexical Similarities

Lovisa Hagström; Ercong Nie; Ruben Halifa; Helmut Schmid; Richard Johansson; Alexander Junge

Language Model Re-rankers are Fooled by Lexical Similarities

Lovisa Hagström, Ercong Nie, Ruben Halifa, Helmut Schmid, Richard Johansson, Alexander Junge

Abstract

Language model (LM) re-rankers are used to refine retrieval results for retrieval-augmented generation (RAG). They are more expensive than lexical matching methods like BM25 but assumed to better process semantic information and the relations between the query and the retrieved answers. To understand whether LM re-rankers always live up to this assumption, we evaluate 6 different LM re-rankers on the NQ, LitQA2 and DRUID datasets. Our results show that LM re-rankers struggle to outperform a simple BM25 baseline on DRUID. Leveraging a novel separation metric based on BM25 scores, we explain and identify re-ranker errors stemming from lexical dissimilarities. We also investigate different methods to improve LM re-ranker performance and find these methods mainly useful for NQ. Taken together, our work identifies and explains weaknesses of LM re-rankers and points to the need for more adversarial and realistic datasets for their evaluation.

Anthology ID:: 2025.fever-1.2
Volume:: Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Mubashara Akhtar, Rami Aly, Christos Christodoulopoulos, Oana Cocarascu, Zhijiang Guo, Arpit Mittal, Michael Schlichtkrull, James Thorne, Andreas Vlachos
Venues:: FEVER | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–33
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.fever-1.2/
DOI:
Bibkey:
Cite (ACL):: Lovisa Hagström, Ercong Nie, Ruben Halifa, Helmut Schmid, Richard Johansson, and Alexander Junge. 2025. Language Model Re-rankers are Fooled by Lexical Similarities. In Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER), pages 18–33, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Language Model Re-rankers are Fooled by Lexical Similarities (Hagström et al., FEVER 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.fever-1.2.pdf

PDF Cite Search Fix data