Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature

Fajri Koto, Biaoyan Fang


Abstract
In this paper, we investigate the utility of modern pretrained language models for the evidence grading system in the medical literature based on the ALTA 2021 shared task. We benchmark 1) domain-specific models that are optimized for medical literature and 2) domain-generic models with rich latent discourse representation (i.e. ELECTRA, RoBERTa). Our empirical experiments reveal that these modern pretrained language models suffer from high variance, and the ensemble method can improve the model performance. We found that ELECTRA performs best with an accuracy of 53.6% on the test set, outperforming domain-specific models.1
Anthology ID:
2021.alta-1.26
Volume:
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2021
Address:
Online
Editors:
Afshin Rahimi, William Lane, Guido Zuccon
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
218–223
Language:
URL:
https://aclanthology.org/2021.alta-1.26
DOI:
Bibkey:
Cite (ACL):
Fajri Koto and Biaoyan Fang. 2021. Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature. In Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association, pages 218–223, Online. Australasian Language Technology Association.
Cite (Informal):
Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature (Koto & Fang, ALTA 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.alta-1.26.pdf
Data
ALTA 2021 Shared Task