Yao Ge


2024

For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.

2022

For the past seven years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted the community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in public, user-generated content. This seventh iteration consists of ten tasks that include English and Spanish posts on Twitter, Reddit, and WebMD. Interest in the #SMM4H shared tasks continues to grow, with 117 teams that registered and 54 teams that participated in at least one task—a 17.5% and 35% increase in registration and participation, respectively, over the last iteration. This paper provides an overview of the tasks and participants’ systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.

2021

This paper describes our approach for the automatic grading of evidence task from the Australasian Language Technology Association (ALTA) Shared Task 2021. We developed two classification models with SVM and RoBERTa and applied an ensemble technique to combine the grades from different classifiers. Our results showed that the SVM model achieved comparable results to the RoBERTa model, and the ensemble system outperformed the individual models on this task. Our system achieved the first place among five teams and obtained 3.3% higher accuracy than the second place.
This paper describes our approach for six classification tasks (Tasks 1a, 3a, 3b, 4 and 5) and one span detection task (Task 1b) from the Social Media Mining for Health (SMM4H) 2021 shared tasks. We developed two separate systems for classification and span detection, both based on pre-trained Transformer-based models. In addition, we applied oversampling and classifier ensembling in the classification tasks. The results of our submissions are over the median scores in all tasks except for Task 1a. Furthermore, our model achieved first place in Task 4 and obtained a 7% higher F1-score than the median in Task 1b.