Variable Extraction for Model Recovery in Scientific Literature

Chunwei Liu, Enrique Noriega-Atala, Adarsh Pyarelal, Clayton T Morrison, Mike Cafarella


Abstract
Due to the increasing productivity in the scientific community, it is difficult to keep up with the literature without the assistance of AI methods. This paper evaluates various methods for extracting mathematical model variables from epidemiological studies, such as ‘infection rate (𝛼),” ‘recovery rate (𝛾),” and ‘mortality rate (𝜇).” Variable extraction appears to be a basic task, but plays a pivotal role in recovering models from scientific literature. Once extracted, we can use these variables for automatic mathematical modeling, simulation, and replication of published results. We also introduce a benchmark dataset comprising manually-annotated variable descriptions and variable values extracted from scientific papers. Our analysis shows that LLM-based solutions perform the best. Despite the incremental benefits of combining rule-based extraction outputs with LLMs, the leap in performance attributed to the transfer-learning and instruction-tuning capabilities of LLMs themselves is far more significant. This investigation demonstrates the potential of LLMs to enhance automatic comprehension of scientific artifacts and for automatic model recovery and simulation.
Anthology ID:
2025.aisd-main.1
Volume:
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Peter Jansen, Bhavana Dalvi Mishra, Harsh Trivedi, Bodhisattwa Prasad Majumder, Tom Hope, Tushar Khot, Doug Downey, Eric Horvitz
Venues:
AISD | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–12
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.aisd-main.1/
DOI:
Bibkey:
Cite (ACL):
Chunwei Liu, Enrique Noriega-Atala, Adarsh Pyarelal, Clayton T Morrison, and Mike Cafarella. 2025. Variable Extraction for Model Recovery in Scientific Literature. In Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, pages 1–12, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Variable Extraction for Model Recovery in Scientific Literature (Liu et al., AISD 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.aisd-main.1.pdf