SURE or Not? Investigating Semantic Understanding in Dense Retrieval Models

Lingdi Kong; Xuanang Chen; Ben He; Le Sun

SURE or Not? Investigating Semantic Understanding in Dense Retrieval Models

Lingdi Kong, Xuanang Chen, Ben He, Le Sun

Abstract

Dense retrieval has become a core technique in applications like web search and retrieval-augmented generation. Despite their empirical success, it remains unclear whether these models truly understand semantics. To address this gap, this paper conducts a systematic investigation by introducing SURE, a benchmark for Semantic Understanding in dense REtrieval built upon the MSMARCO, NQ, and FiQA datasets. SURE characterizes semantic understanding in dense retrieval along three dimensions: semantic precision, semantic abstraction, and semantic equivalence. We evaluate ten representative models ranging from 110M to 8B parameters, including both general-purpose and domain-specific models. Results show that current dense retrievers struggle to distinguish fine-grained semantic differences across texts with varying information density, and to recognize semantic consistency under lexical paraphrasing. Moreover, larger models do not necessarily exhibit stronger semantic understanding, and diverse training data generally enhances semantic understanding on challenging retrieval tasks.

Anthology ID:: 2026.acl-long.2127
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45873–45887
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2127/
DOI:
Bibkey:
Cite (ACL):: Lingdi Kong, Xuanang Chen, Ben He, and Le Sun. 2026. SURE or Not? Investigating Semantic Understanding in Dense Retrieval Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45873–45887, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SURE or Not? Investigating Semantic Understanding in Dense Retrieval Models (Kong et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2127.pdf
Checklist:: 2026.acl-long.2127.checklist.pdf

PDF Cite Search Checklist Fix data