Jose Cortina
Also published as: Jose Miguel Acitores Cortina
2026
Overview of the 11th Social Media Mining for Health (#SMM4H) and Health Real-World Data (HeaRD) Shared Tasks at ACL 2026
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
The aim of the Social Media Mining for Health Applications and Health Real-World Data (#SMM4H-HeaRD) shared tasks is to fos- ter the development and evaluation of natural language processing, machine learning, and artificial intelligence methods for analyzing health-related text from social media and other real-world data sources. For the 11th iteration, held online and co-located with ACL 2026, the workshop continued the expanded #SMM4H- HeaRD platform initiated in 2025, broaden-ing its scope beyond social media to include additional health real-world data sources such as clinical narratives and biomedical literature. The 8 shared tasks covered diverse data sources, health domains (e.g., adverse drug events, insomnia, influenza vaccine effectiveness, cancer staging, substance use), and task formulations (e.g., classification, named entity recognition, span extraction, and text generation). In total, 110 teams registered, representing 31 countries. In this paper, we present an overview of the datasets, participant systems, and performance results, providing insights into current methods for mining social media and health real-world data for biomedical and clinical applications.
Overview of #SMM4H-HeaRD 2026 – Task 6: Predicting TNM staging from pathology reports
Jose Miguel Acitores Cortina | Jacob S. Berkowitz | Nadine A. Friedrich | Nicholas P Tatonetti
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Jose Miguel Acitores Cortina | Jacob S. Berkowitz | Nadine A. Friedrich | Nicholas P Tatonetti
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
This paper provides an overview of Task 6 from the Social Media Mining for Health/Health Real-World Data shared task (#SMM4H-HeaRD 2026), which focused on predicting TNM staging from pathology reports from TCGA. Seven teams submitted systems spanning fine-tuned clinical encoders, open-source generative LLMs, and closed-source API models. On a straightforward test set, most teams achieved near-perfect F1 scores (average 0.993, 0.972, and 0.957 for T, N, and M). However, on a harder tiebreak set where explicit TNM notation was removed and staging had to be inferred from clinical descriptions, performance dropped substantially (average 0.725, 0.783, and 0.846). Notably, the two teams using large closed-source API models generalized best to the harder set, achieving the highest T and N scores despite not leading on the easy set. These results suggest that while fine-tuned domain-specific encoders excel at surface-level extraction, larger general-purpose LLMs may be more robust when staging must be inferred from contextual clinical findings. All teams surpassed baseline overall performance on both test sets.
2024
TLab at #SMM4H 2024: Retrieval-Augmented Generation for ADE Extraction and Normalization
Jacob Berkowitz | Apoorva Srinivasan | Jose Miguel Acitores Cortina | Nicholas P Tatonetti
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
Jacob Berkowitz | Apoorva Srinivasan | Jose Miguel Acitores Cortina | Nicholas P Tatonetti
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
SMM4H 2024 Task 1 is focused on the identification of standardized Adverse Drug Events (ADEs) in tweets. We introduce a novel Retrieval-Augmented Generation (RAG) method, leveraging the capabilities of Llama 3, GPT-4, and the SFR-embedding-mistral model, along with few-shot prompting techniques, to map colloquial tweet language to MedDRA Preferred Terms (PTs) without relying on extensive training datasets. Our method achieved competitive performance, with an F1 score of 0.359 in the normalization task and 0.392 in the named entity recognition (NER) task. Notably, our model demonstrated robustness in identifying previously unseen MedDRA PTs (F1=0.363) greatly surpassing the median task score of 0.141 for such terms.
Search
Fix author
Co-authors
- Jacob Berkowitz 2
- Nicholas P Tatonetti 2
- Jacob S. Berkowitz 1
- Joey Chan 1
- Sumon Kanti Dey 1
- Ivan Flores Amaro 1
- Nadine A. Friedrich 1
- Fernando Gallego 1
- Graciela Gonzalez-Hernandez 1
- Lauren Gryboski 1
- Ari Z. Klein 1
- Martin Krallinger 1
- Salvador Lima-Lopez 1
- Guillermo Lopez-Garcia 1
- Yujun Ma 1
- Tomohiro Nishiyama 1
- Lisa Raithel 1
- Ahmad Rezaie Mianroodi 1
- Amirali Rezaie Mianroodi 1
- Roland Roller 1
- Judith Rosell 1
- Frank Rudzicz 1
- Abeed Sarker 1
- Apoorva Srinivasan 1
- Nicholas Tatonetti 1
- Philippe Thomas 1
- Elena Tutubalina 1
- Dongfang Xu 1
- Farnaz Zeidi 1
- Farnoush Zeidi Kolehparcheh 1
- Yu Zhai 1
- Pierre Zweigenbaum 1