Petr Zelina

2026

Discovery@FI at #SMM4H–HeaRD 2026: Ensemble Character Classifier for Multilingual Clinical NER
Petr Zelina | Vit Novacek
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks

We present a system for multilingual clinical named entity recognition (NER) submitted to the MultiClinNER subtask of MultiClinAI 2026, covering all seven languages and three entity classes (disease, symptom, procedure).Our approach trains one binary token classifier ensemble per entity class using cross-lingual fine-tuning of XLM-RoBERTa-large, with all languages handled jointly.We apply character-level ensembling over six models (two encoder variants × three cross-validation folds).This ensembling method provides more granular probability estimates than single-model classifiers, allowing for more flexible precision-recall trade-off tuning.The system achieves character-level F1 scores of 0.70–0.88 on the official test set.

Co-authors

Vit Novacek 1

Venues

SMM4H1
WS1

Fix author