Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval

Yohan Lee, Yongwoo Song, Sangyeop Kim


Abstract
We present the Conversational Data Retrieval (CDR) benchmark, the first comprehensive test set for evaluating systems that retrieve conversation data for product insights. With 1.6k queries across five analytical tasks and 9.1k conversations, our benchmark provides a reliable standard for measuring conversational data retrieval performance. Our evaluation of 16 popular embedding models shows that even the best models reach only around NDCG@10 of 0.51, revealing a substantial gap between document and conversational data retrieval capabilities. Our work identifies unique challenges in conversational data retrieval (implicit state recognition, turn dynamics, contextual references) while providing practical query templates and detailed error analysis across different task categories. The benchmark dataset and code are available at https://github.com/l-yohai/CDR-Benchmark.
Anthology ID:
2025.emnlp-industry.162
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2343–2366
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.162/
DOI:
Bibkey:
Cite (ACL):
Yohan Lee, Yongwoo Song, and Sangyeop Kim. 2025. Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2343–2366, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval (Lee et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.162.pdf