MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions
Maria Alexeeva, Rebecca Sharp, Marco A. Valenzuela-Escárcega, Jennifer Kadowaki, Adarsh Pyarelal, Clayton Morrison
Abstract
Extending machine reading approaches to extract mathematical concepts and their descriptions is useful for a variety of tasks, ranging from mathematical information retrieval to increasing accessibility of scientific documents for the visually impaired. This entails segmenting mathematical formulae into identifiers and linking them to their natural language descriptions. We propose a rule-based approach for this task, which extracts LaTeX representations of formula identifiers and links them to their in-text descriptions, given only the original PDF and the location of the formula of interest. We also present a novel evaluation dataset for this task, as well as the tool used to create it.- Anthology ID:
- 2020.lrec-1.269
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 2204–2212
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.269
- DOI:
- Cite (ACL):
- Maria Alexeeva, Rebecca Sharp, Marco A. Valenzuela-Escárcega, Jennifer Kadowaki, Adarsh Pyarelal, and Clayton Morrison. 2020. MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2204–2212, Marseille, France. European Language Resources Association.
- Cite (Informal):
- MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions (Alexeeva et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.lrec-1.269.pdf