Atlas: Customizing Large Language Models for Reliable Bibliographic Retrieval and Verification

Akash Kodali, Hailu Xu, Wenlu Zhang, Xin Qin


Abstract
Large Language Models (LLMs) are increasingly used for citation retrieval, yet their bibliographic outputs often contain hallucinated or inconsistent metadata. This paper examines whether structured prompting improves citation reliability compared with traditional API-based retrieval. We implement a three-stage BibTeX-fetching pipeline: a baseline Crossref resolver, a standard GPT prompting method, and a customized verification-guided GPT configuration. Across heterogeneous reference inputs, we evaluate retrieval coverage, field completeness, and metadata accuracy against Crossref ground truth. Results show that prompting improves coverage and completeness. Our findings highlight the importance of prompt design for building reliable, LLM-driven bibliographic retrieval systems.
Anthology ID:
2025.wasp-main.14
Volume:
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:
December
Year:
2025
Address:
Mumbai, India and virtual
Editors:
Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:
WASP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–126
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.14/
DOI:
Bibkey:
Cite (ACL):
Akash Kodali, Hailu Xu, Wenlu Zhang, and Xin Qin. 2025. Atlas: Customizing Large Language Models for Reliable Bibliographic Retrieval and Verification. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 121–126, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):
Atlas: Customizing Large Language Models for Reliable Bibliographic Retrieval and Verification (Kodali et al., WASP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.14.pdf