WildSci: Advancing Scientific Reasoning from In-the-Wild Literature
Tengxiao Liu, Deepak Nathani, Zekun Li, Kevin Yang, William Yang Wang
Abstract
Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily available. In contrast, progress in scientific reasoning remains limited in domains such as medicine and materials science due to restricted dataset coverage and the inherent complexity of open-ended scientific questions. To address these challenges, we propose a general framework for sustainable scientific reasoning QA generation, and introduce WildSci, a new dataset of domain-specific science questions automatically synthesized from peer-reviewed literature, spanning 9 scientific disciplines and 26 subdomains. WildSci enables scalable training with well-defined reward signals in a multiple-choice format. We further apply reinforcement learning to finetune models on WildSci and analyze the resulting training dynamics, including domain-specific performance changes, response behaviors, and generalization trends. Experiments on a suite of scientific benchmarks demonstrate the effectiveness of our framework and dataset. We release WildSci to enable scalable and sustainable research in scientific reasoning.- Anthology ID:
- 2026.findings-acl.567
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11677–11695
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.567/
- DOI:
- Cite (ACL):
- Tengxiao Liu, Deepak Nathani, Zekun Li, Kevin Yang, and William Yang Wang. 2026. WildSci: Advancing Scientific Reasoning from In-the-Wild Literature. In Findings of the Association for Computational Linguistics: ACL 2026, pages 11677–11695, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- WildSci: Advancing Scientific Reasoning from In-the-Wild Literature (Liu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.567.pdf