SciCoQA: Quality Assurance for Scientific Paper–Code Alignment

Tim Baumgärtner; Iryna Gurevych

SciCoQA: Quality Assurance for Scientific Paper–Code Alignment

Abstract

Discrepancies between scientific papers and their code undermine reproducibility, a concern that grows as automated research agents scale scientific output beyond human review capacity. Whether LLMs can reliably detect such discrepancies has not been systematically measured. To this end, we present SciCoQA, a dataset of 635 paper-code discrepancies (92 real, 543 synthetic) for this cross-modal verification task. Across 22 evaluated models, even the best-performing LLMs, Gemini 3.1 Pro and GPT-5 Mini, detect only 46.7% of real-world discrepancies, revealing a critical gap in automated scientific quality assurance. We construct SciCoQA from GitHub issues and reproducibility papers, and propose a synthetic generation pipeline to scale beyond AI to Physics, Quantitative Biology, and other computational sciences. We further introduce a taxonomy of discrepancy types and categories to characterize the occurring mismatches. Our analysis shows that models particularly struggle with omitted paper details, long-context inputs, and papers outside their pre-training corpus.

Anthology ID:: 2026.acl-long.1795
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 38740–38770
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1795/
DOI:
Bibkey:
Cite (ACL):: Tim Baumg\"artner and Iryna Gurevych. 2026. SciCoQA: Quality Assurance for Scientific Paper–Code Alignment. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 38740–38770, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SciCoQA: Quality Assurance for Scientific Paper–Code Alignment (Baumg"artner & Gurevych, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1795.pdf
Checklist:: 2026.acl-long.1795.checklist.pdf

PDF Cite Search Checklist Fix data