BIT.UA at #SMM4HHeaRD 2026: Towards Multi-Class Multilingual Clinical Entity Recognition with Multi-Head CRF Ensembles

Richard A. A. Jonker, Sérgio Matos


Abstract
This paper describes the BIT.UA system for the MultiClinNER shared task at #SMM4H–HeaRD 2026, targeting multilingual clinical named entity recognition across seven languages for three entity types (Disease, Procedure, Symptom). We extend the Multi-Head CRF architecture, originally developed for multi-class NER on Spanish clinical text, to the multilingual setting. To enable joint multi-entity training despite per-entity text variations in the dataset, we develop an adaptive text consolidation pipeline that preserves over 94% of annotations. Our central finding is that a single xlm-roberta-large model, trained jointly on all seven languages and three entity types, achieves competition rank 2 for five of seven languages, outperforming dedicated monolingual models by up to +6.94 F1 points, while requiring only a single set of weights. Ensembling multiple seeds of this model achieves rank 1 for those five languages, and combining it with monolingual models yields rank 1 for the remaining two. Code and models are publicly available at https://github.com/ieeta-pt/Multi-Head-CRF/tree/MultiClinNER and https://huggingface.co/collections/IEETA/multiclinner-models.
Anthology ID:
2026.smm4h-1.8
Volume:
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:
July
Year:
2026
Address:
San Diego, United States
Editors:
Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–48
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.8/
DOI:
Bibkey:
Cite (ACL):
Richard A. A. Jonker and Sérgio Matos. 2026. BIT.UA at #SMM4H–HeaRD 2026: Towards Multi-Class Multilingual Clinical Entity Recognition with Multi-Head CRF Ensembles. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 41–48, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):
BIT.UA at #SMM4H–HeaRD 2026: Towards Multi-Class Multilingual Clinical Entity Recognition with Multi-Head CRF Ensembles (Jonker & Matos, SMM4H 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.8.pdf