Literature discovery with natural language queries

Anna Kiepura; Jessica Lam; Nianlong Gu; Richard Hahnloser

doi:10.18653/v1/2025.sdp-1.8

Literature discovery with natural language queries

Anna Kiepura, Jessica Lam, Nianlong Gu, Richard Hahnloser

Abstract

Literature discovery is a critical component of scientific research. Modern discovery systems leveraging Large Language Models (LLMs) are increasingly adopted for their ability to process natural language queries (NLQs). To assess the robustness of such systems, we compile two NLQ datasets and submit them to nine widely used discovery platforms. Our findings reveal that LLM-based search engines struggle with precisely formulated queries, often producing numerous false positives. However, precision improves when LLMs are used not for direct retrieval but to convert NLQs into structured keyword-based queries. As a result, hybrid systems that integrate both LLM-driven and keyword-based approaches outperform purely keyword-based or purely LLM-based discovery methods.

Anthology ID:: 2025.sdp-1.8
Volume:: Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Tirthankar Ghosal, Philipp Mayr, Amanpreet Singh, Aakanksha Naik, Georg Rehm, Dayne Freitag, Dan Li, Sonja Schimmler, Anita De Waard
Venues:: sdp | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 83–95
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.sdp-1.8/
DOI:: 10.18653/v1/2025.sdp-1.8
Bibkey:
Cite (ACL):: Anna Kiepura, Jessica Lam, Nianlong Gu, and Richard Hahnloser. 2025. Literature discovery with natural language queries. In Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 83–95, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Literature discovery with natural language queries (Kiepura et al., sdp 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.sdp-1.8.pdf

PDF Cite Search Fix data