PeerQA: A Scientific Question Answering Dataset from Peer Reviews

Tim Baumgärtner, Ted Briscoe, Iryna Gurevych


Abstract
We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health.PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens.
Anthology ID:
2025.naacl-long.22
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
508–544
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.22/
DOI:
Bibkey:
Cite (ACL):
Tim Baumgärtner, Ted Briscoe, and Iryna Gurevych. 2025. PeerQA: A Scientific Question Answering Dataset from Peer Reviews. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 508–544, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
PeerQA: A Scientific Question Answering Dataset from Peer Reviews (Baumgärtner et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.22.pdf