When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering

Vojtech Lanz; Pavel Pecina

When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering

Abstract

This paper explores the performance of multilingual models in the general domain on the clinical Question Answering (QA) task to observe their potential medical support for languages that do not benefit from the existence of clinically trained models. In order to improve the model’s performance, we exploit multilingual data augmentation by translating an English clinical QA dataset into six other languages. We propose a translation pipeline including projection of the evidences (answers) into the target languages and thoroughly evaluate several multilingual models fine-tuned on the augmented data, both in mono- and multilingual settings. We find that the translation itself and the subsequent QA experiments present a differently challenging problem for each of the languages. Finally, we compare the performance of multilingual models with pretrained medical domain-specific English models on the original clinical English test set. Contrary to expectations, we find that monolingual domain-specific pretraining is not always superior to general-domain multilingual pretraining. The source code is available at https://github.com/lanzv/Multilingual-emrQA

Anthology ID:: 2025.cl4health-1.6
Volume:: Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Sophia Ananiadou, Dina Demner-Fushman, Deepak Gupta, Paul Thompson
Venues:: CL4Health | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 69–82
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.6/
DOI:
Bibkey:
Cite (ACL):: Vojtech Lanz and Pavel Pecina. 2025. When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering. In Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 69–82, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering (Lanz & Pecina, CL4Health 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.6.pdf

PDF Cite Search Fix data