Abstract
In this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task. As KU_ai team, we used BERT as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7% accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.- Anthology ID:
- W19-5045
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 427–436
- Language:
- URL:
- https://aclanthology.org/W19-5045
- DOI:
- 10.18653/v1/W19-5045
- Cite (ACL):
- Cemil Cengiz, Ulaş Sert, and Deniz Yuret. 2019. KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 427–436, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- KU_ai at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI (Cengiz et al., BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/W19-5045.pdf
- Data
- MultiNLI, SNLI