FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng


Abstract
The application of physics formulas is a fundamental human capability in numerical reasoning. While existing datasets often rely on implicit mathematical knowledge, they rarely explicitate the underlying formulas. To address this, we introduce FormulaReasoning, a new benchmark for formula-based numerical reasoning comprising 5,324 questions requiring calculations grounded in external physics principles. We provide high-quality, fine-grained annotations in English and Chinese—including formula structures, parameter names, symbols, values, and units—curated through manual effort and LLM-assisted validation. Additionally, we provide a consolidated formula database as an external knowledge source. To further challenge model performance, we develop an extended version of the dataset by coupling multiple questions. We evaluate various architectural and methodological frameworks, including retrieval-augmented methods, modular reasoning (formula generation, parameter extraction, and calculation), and preference-based optimization. Our analysis identifies critical challenges in formula-based reasoning, highlighting significant opportunities for future methodological advancement.
Anthology ID:
2026.findings-acl.1580
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31579–31601
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1580/
DOI:
Bibkey:
Cite (ACL):
Xiao Li, Bolin Zhu, Kaiwen Shi, Sichen Liu, Yin Zhu, Yiwei Liu, and Gong Cheng. 2026. FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31579–31601, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1580.pdf
Checklist:
 2026.findings-acl.1580.checklist.pdf