Andres Duque


2026

This paper describes the participation of the LSI_UNED team in the firt sub-task of MultiClinAI at the #SMM4H-HeaRD 2026 Workshop, which focuses on multilingual clinical named entity recognition in seven languages. The task requires identifying mentions of diseases, procedures, and symptoms in clinical case reports. We propose a set of systems based on the W2NER architecture, with a separate model trained for each language and entity type. For Spanish, we use a RoBERTa-based model with data augmentation from additional NER resources, while English and Italian systems are based on different biomedical BERT variants. Results show consistent performance across languages, with the best overall results obtained for Spanish. Data augmentation improves recall and F1, while English and Italian models achieve competitive but slightly lower scores. Symptom recognition remains the most challenging entity type across all languages.

2016