Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case
Jingqing Zhang, Luis Bolanos Trujillo, Tong Li, Ashwani Tanwar, Guilherme Freire, Xian Yang, Julia Ive, Vibhor Gupta, Yike Guo
Abstract
Contextualised word embeddings is a powerful tool to detect contextual synonyms. However, most of the current state-of-the-art (SOTA) deep learning concept extraction methods remain supervised and underexploit the potential of the context. In this paper, we propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching. We apply our methodology in the sparse multi-class setting (over 15,000 concepts) to extract phenotype information from electronic health records. We further investigate data augmentation techniques to address the problem of the class sparsity. Our approach achieves a new SOTA for the unsupervised phenotype concept annotation on clinical text on F1 and Recall outperforming the previous SOTA with a gain of up to 4.5 and 4.0 absolute points, respectively. After fine-tuning with as little as 20% of the labelled data, we also outperform BioBERT and ClinicalBERT. The extrinsic evaluation on three ICU benchmarks also shows the benefit of using the phenotypes annotated by our model as features.- Anthology ID:
- 2021.emnlp-main.690
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8754–8769
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.690
- DOI:
- 10.18653/v1/2021.emnlp-main.690
- Cite (ACL):
- Jingqing Zhang, Luis Bolanos Trujillo, Tong Li, Ashwani Tanwar, Guilherme Freire, Xian Yang, Julia Ive, Vibhor Gupta, and Yike Guo. 2021. Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8754–8769, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case (Zhang et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.emnlp-main.690.pdf