Enigma at #SMM4HHeaRD 2026: Leveraging Multilingual Pre-trained Models for Clinical Named Entity Recognition

Sylvia Vassileva, Plamena Ilieva, Teodor Svetoslavov Kostadinov, Monika Peteva Petkova, Daniel Manchevski, Vitosh Doynov, Ivan Koychev, Svetla Boytcheva


Abstract
This paper addresses the MultiClinAI challenge, subtask MultiClinNER, which focuses on clinical Named Entity Recognition (NER) across seven languages: Czech, Dutch, English, Italian, Romanian, Spanish, and Swedish. The main goal of MultiClinNER is to identify and extract clinical terms specifically related to diseases, procedures, and symptoms from discharge summaries. The paper explores a variety of state-of-the-art methods, both monolingual and multilingual, ranging from pretrained, zero-shot, domain-adapted transformers to fine-tuned transformer models, and demonstrates the benefits of ensemble modeling. Data augmentation through external resources significantly enhanced the models’ ability to recognize clinical entities. Both monolingual and multilingual approaches showed complementary strengths depending on the language and entity type. The average F1 score achieved across the best models for each language and category is 0.6502.
Anthology ID:
2026.smm4h-1.19
Volume:
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:
July
Year:
2026
Address:
San Diego, United States
Editors:
Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–120
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.19/
DOI:
Bibkey:
Cite (ACL):
Sylvia Vassileva, Plamena Ilieva, Teodor Svetoslavov Kostadinov, Monika Peteva Petkova, Daniel Manchevski, Vitosh Doynov, Ivan Koychev, and Svetla Boytcheva. 2026. Enigma at #SMM4H–HeaRD 2026: Leveraging Multilingual Pre-trained Models for Clinical Named Entity Recognition. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 113–120, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):
Enigma at #SMM4H–HeaRD 2026: Leveraging Multilingual Pre-trained Models for Clinical Named Entity Recognition (Vassileva et al., SMM4H 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.19.pdf