An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding

Jie Cao, Jing Xiao


Abstract
Automatic math problem solving has attracted much attention of NLP researchers recently. However, most of the works focus on the solving of Math Word Problems (MWPs). In this paper, we study on the Geometric Problem Solving based on neural networks. Solving geometric problems requires the integration of text and diagram information as well as the knowledge of the relevant theorems. The lack of high-quality datasets and efficient neural geometric solvers impedes the development of automatic geometric problems solving. Based on GeoQA, we newly annotate 2,518 geometric problems with richer types and greater difficulty to form an augmented benchmark dataset GeoQA+, containing 6,027 problems in training set and 7,528 totally. We further perform data augmentation method to expand the training set to 12,054. Besides, we design a Dual Parallel text Encoder DPE to efficiently encode long and medium-length problem text. The experimental results validate the effectiveness of GeoQA+ and DPE module, and the accuracy of automatic geometric problem solving is improved to 66.09%.
Anthology ID:
2022.coling-1.130
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1511–1520
Language:
URL:
https://aclanthology.org/2022.coling-1.130
DOI:
Bibkey:
Cite (ACL):
Jie Cao and Jing Xiao. 2022. An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1511–1520, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding (Cao & Xiao, COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2022.coling-1.130.pdf
Data
Geometry3K