TDAC, The First Corpus in Time-Domain Astrophysics: Analysis and First Experiments on Named Entity Recognition

Atilla Kaan Alkan, Cyril Grouin, Fabian Schussler, Pierre Zweigenbaum


Abstract
The increased interest in time-domain astronomy over the last decades has resulted in a substantial increase in observation reports publication leading to a saturation of how astrophysicists read, analyze and classify information. Due to the short life span of the detected astronomical events, the information related to the characterization of new phenomena has to be communicated and analyzed very rapidly to allow other observatories to react and conduct their follow-up observations. This paper introduces TDAC: the first Corpus in Time-Domain Astrophysics, based on observation reports. We also present the NLP experiments we made for named entity recognition based on annotations we made and annotations from the WIESP NLP Challenge.
Anthology ID:
2022.wiesp-1.15
Volume:
Proceedings of the first Workshop on Information Extraction from Scientific Publications
Month:
November
Year:
2022
Address:
Online
Venue:
WIESP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–139
Language:
URL:
https://aclanthology.org/2022.wiesp-1.15
DOI:
Bibkey:
Cite (ACL):
Atilla Kaan Alkan, Cyril Grouin, Fabian Schussler, and Pierre Zweigenbaum. 2022. TDAC, The First Corpus in Time-Domain Astrophysics: Analysis and First Experiments on Named Entity Recognition. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 131–139, Online. Association for Computational Linguistics.
Cite (Informal):
TDAC, The First Corpus in Time-Domain Astrophysics: Analysis and First Experiments on Named Entity Recognition (Alkan et al., WIESP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wiesp-1.15.pdf