Numerical Correlation in Text

Daniel Spokoyny, Chien-Sheng Wu, Caiming Xiong


Abstract
Evaluation of quantitative reasoning of large language models is an important step towards understanding their current capabilities and limitations. We propose a new task, Numerical Correlation in Text, which requires models to identify the correlation between two numbers in a sentence. To this end, we introduce a new dataset, which contains over 2,000 Wikipedia sentences with two numbers and their correlation labels. Using this dataset we are able to show that recent numerically aware pretraining methods for language models do not help generalization on this task posing a challenge for future work in this area.
Anthology ID:
2022.mathnlp-1.5
Volume:
Proceedings of the 1st Workshop on Mathematical Natural Language Processing (MathNLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Deborah Ferreira, Marco Valentino, Andre Freitas, Sean Welleck, Moritz Schubotz
Venue:
MathNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–39
Language:
URL:
https://aclanthology.org/2022.mathnlp-1.5
DOI:
10.18653/v1/2022.mathnlp-1.5
Bibkey:
Cite (ACL):
Daniel Spokoyny, Chien-Sheng Wu, and Caiming Xiong. 2022. Numerical Correlation in Text. In Proceedings of the 1st Workshop on Mathematical Natural Language Processing (MathNLP), pages 33–39, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Numerical Correlation in Text (Spokoyny et al., MathNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.mathnlp-1.5.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2022.mathnlp-1.5.mp4