Automated Discovery of Mathematical Definitions in Text

Natalia Vanetik, Marina Litvak, Sergey Shevchuk, Lior Reznik


Abstract
Automatic definition extraction from texts is an important task that has numerous applications in several natural language processing fields such as summarization, analysis of scientific texts, automatic taxonomy generation, ontology generation, concept identification, and question answering. For definitions that are contained within a single sentence, this problem can be viewed as a binary classification of sentences into definitions and non-definitions. Definitions in scientific literature can be generic (Wikipedia) or more formal (mathematical articles). In this paper, we focus on automatic detection of one-sentence definitions in mathematical texts, which are difficult to separate from surrounding text. We experiment with several data representations, which include sentence syntactic structure and word embeddings, and apply deep learning methods such as convolutional neural network (CNN) and recurrent neural network (RNN), in order to identify mathematical definitions. Our experiments demonstrate the superiority of CNN and its combination with RNN, applied on the syntactically-enriched input representation. We also present a new dataset for definition extraction from mathematical texts. We demonstrate that the use of this dataset for training learning models improves the quality of definition extraction when these models are then used for other definition datasets. Our experiments with different domains approve that mathematical definitions require special treatment, and that using cross-domain learning is inefficient.
Anthology ID:
2020.lrec-1.256
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2086–2094
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.256
DOI:
Bibkey:
Cite (ACL):
Natalia Vanetik, Marina Litvak, Sergey Shevchuk, and Lior Reznik. 2020. Automated Discovery of Mathematical Definitions in Text. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2086–2094, Marseille, France. European Language Resources Association.
Cite (Informal):
Automated Discovery of Mathematical Definitions in Text (Vanetik et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.lrec-1.256.pdf