Astro-mT5: Entity Extraction from Astrophysics Literature using mT5 Language Model

Madhusudan Ghosh, Payel Santra, Sk Asif Iqbal, Partha Basuchowdhuri


Abstract
Scientific research requires reading and extracting relevant information from existing scientific literature in an effective way. To gain insights over a collection of such scientific documents, extraction of entities and recognizing their types is considered to be one of the important tasks. Numerous studies have been conducted in this area of research. In our study, we introduce a framework for entity recognition and identification of NASA astrophysics dataset, which was published as a part of the DEAL SharedTask. We use a pre-trained multilingual model, based on a natural language processing framework for the given sequence labeling tasks. Experiments show that our model, Astro-mT5, out-performs the existing baseline in astrophysics related information extraction.
Anthology ID:
2022.wiesp-1.12
Volume:
Proceedings of the first Workshop on Information Extraction from Scientific Publications
Month:
November
Year:
2022
Address:
Online
Venue:
WIESP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–104
Language:
URL:
https://aclanthology.org/2022.wiesp-1.12
DOI:
Bibkey:
Cite (ACL):
Madhusudan Ghosh, Payel Santra, Sk Asif Iqbal, and Partha Basuchowdhuri. 2022. Astro-mT5: Entity Extraction from Astrophysics Literature using mT5 Language Model. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 100–104, Online. Association for Computational Linguistics.
Cite (Informal):
Astro-mT5: Entity Extraction from Astrophysics Literature using mT5 Language Model (Ghosh et al., WIESP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wiesp-1.12.pdf