Qingzhi Liu
2026
SciText2Eq: Assessing LLMs for Explainable Equation Generation for Scientific Creativity
Yifan Mo | Xiao Fu | Yue Su | Qingyu Meng | Koen Hindriks | Qingzhi Liu | Jiahuan Pei
Findings of the Association for Computational Linguistics: ACL 2026
Yifan Mo | Xiao Fu | Yue Su | Qingyu Meng | Koen Hindriks | Qingzhi Liu | Jiahuan Pei
Findings of the Association for Computational Linguistics: ACL 2026
This work investigates the ability of large language models (LLMs) to generate mathematical equations from scientific texts. Prior work faces challenges in unstructured grounding, multi-equation dependency, and human-aligned evaluation. To address this, we construct a dataset of AI research papers, pairing contextual passages with ground-truth equations and variable descriptions. We develop an explainable equation generation workflow and evaluate it across diverse open- and closed-source LLMs. Our evaluation protocol combines automatic metrics, LLM-based rubrics, and human judgments to assess accuracy, explainability, and human-LLM alignment. Results show that LLMs achieve moderate performance on lexical and syntactic similarity, but struggle with semantic accuracy. LLM-based evaluations show limited alignment with human judgments, highlighting challenges in assessing equation quality. These findings provide insights for improving equation generation models and developing more reliable evaluation methods for scientific creativity. We provide code and data for reproducibility.