WildSci: Advancing Scientific Reasoning from In-the-Wild Literature

Tengxiao Liu, Deepak Nathani, Zekun Li, Kevin Yang, William Yang Wang


Abstract
Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily available. In contrast, progress in scientific reasoning remains limited in domains such as medicine and materials science due to restricted dataset coverage and the inherent complexity of open-ended scientific questions. To address these challenges, we propose a general framework for sustainable scientific reasoning QA generation, and introduce WildSci, a new dataset of domain-specific science questions automatically synthesized from peer-reviewed literature, spanning 9 scientific disciplines and 26 subdomains. WildSci enables scalable training with well-defined reward signals in a multiple-choice format. We further apply reinforcement learning to finetune models on WildSci and analyze the resulting training dynamics, including domain-specific performance changes, response behaviors, and generalization trends. Experiments on a suite of scientific benchmarks demonstrate the effectiveness of our framework and dataset. We release WildSci to enable scalable and sustainable research in scientific reasoning.
Anthology ID:
2026.findings-acl.567
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11677–11695
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.567/
DOI:
Bibkey:
Cite (ACL):
Tengxiao Liu, Deepak Nathani, Zekun Li, Kevin Yang, and William Yang Wang. 2026. WildSci: Advancing Scientific Reasoning from In-the-Wild Literature. In Findings of the Association for Computational Linguistics: ACL 2026, pages 11677–11695, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
WildSci: Advancing Scientific Reasoning from In-the-Wild Literature (Liu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.567.pdf
Checklist:
 2026.findings-acl.567.checklist.pdf