FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng
Abstract
The application of physics formulas is a fundamental human capability in numerical reasoning. While existing datasets often rely on implicit mathematical knowledge, they rarely explicitate the underlying formulas. To address this, we introduce FormulaReasoning, a new benchmark for formula-based numerical reasoning comprising 5,324 questions requiring calculations grounded in external physics principles. We provide high-quality, fine-grained annotations in English and Chinese—including formula structures, parameter names, symbols, values, and units—curated through manual effort and LLM-assisted validation. Additionally, we provide a consolidated formula database as an external knowledge source. To further challenge model performance, we develop an extended version of the dataset by coupling multiple questions. We evaluate various architectural and methodological frameworks, including retrieval-augmented methods, modular reasoning (formula generation, parameter extraction, and calculation), and preference-based optimization. Our analysis identifies critical challenges in formula-based reasoning, highlighting significant opportunities for future methodological advancement.- Anthology ID:
- 2026.findings-acl.1580
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31579–31601
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1580/
- DOI:
- Cite (ACL):
- Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, and Gong Cheng. 2026. FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31579–31601, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning (Li et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1580.pdf