Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language
Henning Schäfer, Ahmad Idrissi-Yaghir, Peter Horn, Christoph Friedrich
Abstract
In this work, cross-linguistic span prediction based on contextualized word embedding models is used together with neural machine translation (NMT) to transfer and apply the state-of-the-art models in natural language processing (NLP) to a low-resource language clinical corpus. Two directions are evaluated: (a) English models can be applied to translated texts to subsequently transfer the predicted annotations to the source language and (b) existing high-quality annotations can be transferred beyond translation and then used to train NLP models in the target language. Effectiveness and loss of transmission is evaluated using the German Berlin-Tübingen-Oncology Corpus (BRONCO) dataset with transferred external data from NCBI disease, SemEval-2013 drug-drug interaction (DDI) and i2b2/VA 2010 data. The use of English models for translated clinical texts has always involved attempts to take full advantage of the benefits associated with them (large pre-trained biomedical word embeddings). To improve advances in this area, we provide a general-purpose pipeline to transfer any annotated BRAT or CoNLL format to various target languages. For the entity class medication, good results were obtained with 0.806 F1-score after re-alignment. Limited success occurred in the diagnosis and treatment class with results just below 0.5 F1-score due to differences in annotation guidelines.- Anthology ID:
- 2022.clinicalnlp-1.6
- Volume:
- Proceedings of the 4th Clinical Natural Language Processing Workshop
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, WA
- Venue:
- ClinicalNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 53–62
- Language:
- URL:
- https://aclanthology.org/2022.clinicalnlp-1.6
- DOI:
- 10.18653/v1/2022.clinicalnlp-1.6
- Cite (ACL):
- Henning Schäfer, Ahmad Idrissi-Yaghir, Peter Horn, and Christoph Friedrich. 2022. Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 53–62, Seattle, WA. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Language Transfer of High-Quality Annotations: Combining Neural Machine Translation with Cross-Linguistic Span Alignment to Apply NER to Clinical Texts in a Low-Resource Language (Schäfer et al., ClinicalNLP 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.clinicalnlp-1.6.pdf
- Code
- 0xhesch/clat-cross-lingual-annotation-transfer
- Data
- 2010 i2b2/VA, DDI, MIMIC-III, NCBI Disease