Discovery@FI at #SMM4HHeaRD 2026: Ensemble Character Classifier for Multilingual Clinical NER

Petr Zelina, Vit Novacek


Abstract
We present a system for multilingual clinical named entity recognition (NER) submitted to the MultiClinNER subtask of MultiClinAI 2026, covering all seven languages and three entity classes (disease, symptom, procedure).Our approach trains one binary token classifier ensemble per entity class using cross-lingual fine-tuning of XLM-RoBERTa-large, with all languages handled jointly.We apply character-level ensembling over six models (two encoder variants × three cross-validation folds).This ensembling method provides more granular probability estimates than single-model classifiers, allowing for more flexible precision-recall trade-off tuning.The system achieves character-level F1 scores of 0.70–0.88 on the official test set.
Anthology ID:
2026.smm4h-1.28
Volume:
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:
July
Year:
2026
Address:
San Diego, United States
Editors:
Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
173–176
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.28/
DOI:
Bibkey:
Cite (ACL):
Petr Zelina and Vit Novacek. 2026. Discovery@FI at #SMM4H–HeaRD 2026: Ensemble Character Classifier for Multilingual Clinical NER. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 173–176, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):
Discovery@FI at #SMM4H–HeaRD 2026: Ensemble Character Classifier for Multilingual Clinical NER (Zelina & Novacek, SMM4H 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.28.pdf