SIEMENS at #SMM4H–HeaRD 2026: The Impact of Training Strategy and Backbone Selection on BERT-based Multilingual Clinical NER

Manuela Daniela Danu

SIEMENS at #SMM4H–HeaRD 2026: The Impact of Training Strategy and Backbone Selection on BERT-based Multilingual Clinical NER

Abstract

This paper describes our participation in the MultiClinNER subtask of the MultiClinAI shared task, part of the #SMM4H-HeaRD Workshop at ACL 2026. The task requires identifying DISEASE, SYMPTOM, and PROCEDURE mentions in clinical case reports across seven languages: Czech, Dutch, English, Italian, Romanian, Spanish, and Swedish. We compare two BERT-based sequence labeling methods: (i) sentence-level token classification with a fixed train/validation split, and (ii) paragraph-level chunking with 5-fold cross-validation and checkpoint merging, using language-specific BERT models and multilingual XLM-RoBERTa-large as backbones. Our results show that 5-fold training with checkpoint merging consistently outperforms the fixed split strategy, with further analysis suggesting that the gains are primarily driven by improved training-set coverage rather than by differences in input granularity. Language-specific BERT encoders prove most effective for Spanish and English, while XLM-RoBERTa-large yields the strongest results for the remaining five languages through cross-lingual transfer.

Anthology ID:: 2026.smm4h-1.34
Volume:: Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:: July
Year:: 2026
Address:: San Diego, United States
Editors:: Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:: SMM4H | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 216–221
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.34/
DOI:
Bibkey:
Cite (ACL):: Manuela Daniela Danu. 2026. SIEMENS at #SMM4H–HeaRD 2026: The Impact of Training Strategy and Backbone Selection on BERT-based Multilingual Clinical NER. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 216–221, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):: SIEMENS at #SMM4H–HeaRD 2026: The Impact of Training Strategy and Backbone Selection on BERT-based Multilingual Clinical NER (Danu, SMM4H 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.34.pdf

PDF Cite Search Fix data