2024
pdf
abs
Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP
Dongfang Xu
|
Guillermo Garcia
|
Lisa Raithel
|
Philippe Thomas
|
Roland Roller
|
Eiji Aramaki
|
Shoko Wakamiya
|
Shuntaro Yada
|
Pierre Zweigenbaum
|
Karen O’Connor
|
Sai Samineni
|
Sophia Hernandez
|
Yao Ge
|
Swati Rajwal
|
Sudeshna Das
|
Abeed Sarker
|
Ari Klein
|
Ana Schmidt
|
Vishakha Sharma
|
Raul Rodriguez-Esteban
|
Juan Banda
|
Ivan Amaro
|
Davy Weissenbacher
|
Graciela Gonzalez-Hernandez
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
2022
pdf
abs
Overview of the Seventh Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2022
Davy Weissenbacher
|
Juan Banda
|
Vera Davydova
|
Darryl Estrada Zavala
|
Luis Gasco Sánchez
|
Yao Ge
|
Yuting Guo
|
Ari Klein
|
Martin Krallinger
|
Mathias Leddin
|
Arjun Magge
|
Raul Rodriguez-Esteban
|
Abeed Sarker
|
Lucia Schmidt
|
Elena Tutubalina
|
Graciela Gonzalez-Hernandez
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
For the past seven years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted the community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in public, user-generated content. This seventh iteration consists of ten tasks that include English and Spanish posts on Twitter, Reddit, and WebMD. Interest in the #SMM4H shared tasks continues to grow, with 117 teams that registered and 54 teams that participated in at least one task—a 17.5% and 35% increase in registration and participation, respectively, over the last iteration. This paper provides an overview of the tasks and participants’ systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
2021
pdf
abs
Pre-trained Transformer-based Classification and Span Detection Models for Social Media Health Applications
Yuting Guo
|
Yao Ge
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
This paper describes our approach for six classification tasks (Tasks 1a, 3a, 3b, 4 and 5) and one span detection task (Task 1b) from the Social Media Mining for Health (SMM4H) 2021 shared tasks. We developed two separate systems for classification and span detection, both based on pre-trained Transformer-based models. In addition, we applied oversampling and classifier ensembling in the classification tasks. The results of our submissions are over the median scores in all tasks except for Task 1a. Furthermore, our model achieved first place in Task 4 and obtained a 7% higher F1-score than the median in Task 1b.
pdf
abs
An Ensemble Model for Automatic Grading of Evidence
Yuting Guo
|
Yao Ge
|
Ruqi Liao
|
Abeed Sarker
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
This paper describes our approach for the automatic grading of evidence task from the Australasian Language Technology Association (ALTA) Shared Task 2021. We developed two classification models with SVM and RoBERTa and applied an ensemble technique to combine the grades from different classifiers. Our results showed that the SVM model achieved comparable results to the RoBERTa model, and the ensemble system outperformed the individual models on this task. Our system achieved the first place among five teams and obtained 3.3% higher accuracy than the second place.