An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks

Yuxing Wang, Kaiyin Zhou, Mina Gachloo, Jingbo Xia


Abstract
The active gene annotation corpus (AGAC) was developed to support knowledge discovery for drug repurposing. Based on the corpus, the AGAC track of the BioNLP Open Shared Tasks 2019 was organized, to facilitate cross-disciplinary collaboration across BioNLP and Pharmacoinformatics communities, for drug repurposing. The AGAC track consists of three subtasks: 1) named entity recognition, 2) thematic relation extraction, and 3) loss of function (LOF) / gain of function (GOF) topic classification. The AGAC track was participated by five teams, of which the performance are compared and analyzed. The the results revealed a substantial room for improvement in the design of the task, which we analyzed in terms of “imbalanced data”, “selective annotation” and “latent topic annotation”.
Anthology ID:
D19-5710
Volume:
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks
Month:
November
Year:
2019
Address:
Hong Kong, China
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–71
Language:
URL:
https://aclanthology.org/D19-5710
DOI:
10.18653/v1/D19-5710
Bibkey:
Cite (ACL):
Yuxing Wang, Kaiyin Zhou, Mina Gachloo, and Jingbo Xia. 2019. An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks. In Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pages 62–71, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks (Wang et al., BioNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/D19-5710.pdf
Code
 YaoXinZhi/BERT-CRF-for-BioNLP-OST2019-AGAC-Task1