ACL-rlg: A Dataset for Reading List Generation
Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, Richard Dufour
Abstract
Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.- Anthology ID:
- 2025.coling-main.327
- Original:
- 2025.coling-main.327v1
- Version 2:
- 2025.coling-main.327v2
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4910–4919
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.327/
- DOI:
- Cite (ACL):
- Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, and Richard Dufour. 2025. ACL-rlg: A Dataset for Reading List Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4910–4919, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- ACL-rlg: A Dataset for Reading List Generation (Aubert-Béduchaud et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.327.pdf