Reliable Evaluation Protocol for Low-Precision Retrieval

Kisu Yang; Yoonna Jang; Hwanseok Jang; Kenneth Choi; Isabelle Augenstein; Heui-Seok Lim

Reliable Evaluation Protocol for Low-Precision Retrieval

Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, Heuiseok Lim

Abstract

Lowering the numerical precision of model parameters and computations is widely adopted to improve the efficiency of retrieval systems. However, when computing relevance scores between the query and documents in low-precision, we observe spurious ties due to the reduced granularity. This introduces high variability in the results based on tie resolution, making the evaluation less reliable. To address this, we propose a more robust retrieval evaluation protocol designed to reduce score variation. It consists of: (1) High-Precision Scoring (HPS), which upcasts the final scoring step to higher precision to resolve tied candidates with minimal computational cost; and (2) Tie-aware Retrieval Metrics (TRM), which report expected scores, range, and bias to quantify order uncertainty of tied candidates. Our experiments test multiple models with three scoring functions on twelve retrieval datasets to demonstrate that HPS dramatically reduces tie-induced instability, and TRM accurately recovers expected metric values. This combination enables a more consistent and reliable evaluation system for lower-precision retrieval.

Anthology ID:: 2026.acl-short.33
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 396–409
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.33/
DOI:
Bibkey:
Cite (ACL):: Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, and Heuiseok Lim. 2026. Reliable Evaluation Protocol for Low-Precision Retrieval. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 396–409, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Reliable Evaluation Protocol for Low-Precision Retrieval (Yang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-short.33.pdf
Checklist:: 2026.acl-short.33.checklist.pdf

PDF Cite Search Checklist Fix data