Farnaz Zeidi
2026
Overview of the 11th Social Media Mining for Health (#SMM4H) and Health Real-World Data (HeaRD) Shared Tasks at ACL 2026
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
The aim of the Social Media Mining for Health Applications and Health Real-World Data (#SMM4H-HeaRD) shared tasks is to fos- ter the development and evaluation of natural language processing, machine learning, and artificial intelligence methods for analyzing health-related text from social media and other real-world data sources. For the 11th iteration, held online and co-located with ACL 2026, the workshop continued the expanded #SMM4H- HeaRD platform initiated in 2025, broaden-ing its scope beyond social media to include additional health real-world data sources such as clinical narratives and biomedical literature. The 8 shared tasks covered diverse data sources, health domains (e.g., adverse drug events, insomnia, influenza vaccine effectiveness, cancer staging, substance use), and task formulations (e.g., classification, named entity recognition, span extraction, and text generation). In total, 110 teams registered, representing 31 countries. In this paper, we present an overview of the datasets, participant systems, and performance results, providing insights into current methods for mining social media and health real-world data for biomedical and clinical applications.
PEI at #SMM4H-HeaRD 2026: Enhancing Patient Metadata Detection via Hypothesis-Conditioned Classification and Paraphrase-Based Data Augmentation
Farnaz Zeidi | Roman Christof | Farnoush Zeidi | Renate König | Liam Childs
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Farnaz Zeidi | Roman Christof | Farnoush Zeidi | Renate König | Liam Childs
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
This paper presents our approach to Task 5 of the #SMM4H-HeaRD 2026 Workshop, which focuses on detecting patient metadata in SARS-CoV-2 sequencing articles as a binary classification task. We explore both encoder-based and large language model (LLM) approaches, using BioM-BERT as a baseline and Mistral-Nemo as the LLM. To improve performance, we propose a paraphrase-based data augmentation pipeline using Qwen3, where paraphrased training and validation instances are added for fine-tuning. For the LLM, we perform prompt refinement and error analysis, while for the encoder-based model, we reformulate the task as a hypothesis-conditioned classification task inspired by Natural Language Inference (NLI). Our methods improve both models: Mistral-Nemo increases from 0.423 to 0.750 F1, and BioM-BERT from 0.801 to 0.821 on the validation set. Although Mistral-Nemo does not surpass BioM-BERT, our best BioM-BERT model achieves an F1-score of 0.786 on the test set, outperforming the mean and median of competing systems. To support reproducibility, we release our best-performing model on Hugging Face.
2025
MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning
Roman Christof | Farnaz Zeidi | Manuela Messelhäußer | Dirk Mentzer | Renate Koenig | Liam Childs | Alexander Mehler
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Roman Christof | Farnaz Zeidi | Manuela Messelhäußer | Dirk Mentzer | Renate Koenig | Liam Childs | Alexander Mehler
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In pharmacovigilance, effective automation of medical data structuring, especially linking entities to standardized terminologies such as MedDRA, is critical. This challenge is rarely addressed for German data. With MedLinkDE we address German MedDRA entity linking for adverse drug reactions in a two-step approach: (1) retrieval of medical terms with fine-tuned embedding models, followed (2) by guided chain-of-thought re-ranking using LLMs. To this end, we introduce RENOde, a German real-world MedDRA dataset consisting of reportings from patients and healthcare professionals. To overcome the challenges posed by the linguistic diversity of these reports, we generate synthetic data mapping the two reporting styles of patients and healthcare professionals. Our embedding models, fine-tuned on these synthetic, quasi-personalized datasets, show competitive performance with real datasets in terms of accuracy at high top- recall, providing a robust basis for re-ranking. Our subsequent guided Chain of Thought (CoT) re-ranking, informed by MedDRA coding guidelines, improves entity linking accuracy by approximately 15% (Acc@1) compared to embedding-only strategies. In this way, our approach demonstrates the feasibility of entity linking in medical reports under the constraints of data scarcity by relying on synthetic data reflecting different informant roles of reporting persons.
Search
Fix author
Co-authors
- Liam Childs 2
- Roman Christof 2
- Jacob Berkowitz 1
- Joey Chan 1
- Jose Cortina 1
- Sumon Kanti Dey 1
- Ivan Flores Amaro 1
- Fernando Gallego 1
- Graciela Gonzalez-Hernandez 1
- Lauren Gryboski 1
- Ari Z. Klein 1
- Renate Koenig 1
- Martin Krallinger 1
- Renate König 1
- Salvador Lima-Lopez 1
- Guillermo Lopez-Garcia 1
- Yujun Ma 1
- Alexander Mehler 1
- Dirk Mentzer 1
- Manuela Messelhäußer 1
- Tomohiro Nishiyama 1
- Lisa Raithel 1
- Ahmad Rezaie Mianroodi 1
- Amirali Rezaie Mianroodi 1
- Roland Roller 1
- Judith Rosell 1
- Frank Rudzicz 1
- Abeed Sarker 1
- Nicholas Tatonetti 1
- Philippe Thomas 1
- Elena Tutubalina 1
- Dongfang Xu 1
- Farnoush Zeidi 1
- Farnoush Zeidi Kolehparcheh 1
- Yu Zhai 1
- Pierre Zweigenbaum 1