2021
pdf
bib
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Arjun Magge
|
Ari Klein
|
Antonio Miranda-Escalada
|
Mohammed Ali Al-garadi
|
Ilseyar Alimova
|
Zulfat Miftahutdinov
|
Eulalia Farre-Maduell
|
Salvador Lima Lopez
|
Ivan Flores
|
Karen O'Connor
|
Davy Weissenbacher
|
Elena Tutubalina
|
Abeed Sarker
|
Juan M Banda
|
Martin Krallinger
|
Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
pdf
abs
Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021
Arjun Magge
|
Ari Klein
|
Antonio Miranda-Escalada
|
Mohammed Ali Al-Garadi
|
Ilseyar Alimova
|
Zulfat Miftahutdinov
|
Eulalia Farre
|
Salvador Lima López
|
Ivan Flores
|
Karen O’Connor
|
Davy Weissenbacher
|
Elena Tutubalina
|
Abeed Sarker
|
Juan Banda
|
Martin Krallinger
|
Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
The global growth of social media usage over the past decade has opened research avenues for mining health related information that can ultimately be used to improve public health. The Social Media Mining for Health Applications (#SMM4H) shared tasks in its sixth iteration sought to advance the use of social media texts such as Twitter for pharmacovigilance, disease tracking and patient centered outcomes. #SMM4H 2021 hosted a total of eight tasks that included reruns of adverse drug effect extraction in English and Russian and newer tasks such as detecting medication non-adherence from Twitter and WebMD forum, detecting self-reported adverse pregnancy outcomes, detecting cases and symptoms of COVID-19, identifying occupations mentioned in Spanish by Twitter users, and detecting self-reported breast cancer diagnosis. The eight tasks included a total of 12 individual subtasks spanning three languages requiring methods for binary classification, multi-class classification, named entity recognition and entity normalization. With a total of 97 registering teams and 40 teams submitting predictions, the interest in the shared tasks grew by 70% and participation grew by 38% compared to the previous iteration.
pdf
abs
Pre-trained Transformer-based Classification and Span Detection Models for Social Media Health Applications
Yuting Guo
|
Yao Ge
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
This paper describes our approach for six classification tasks (Tasks 1a, 3a, 3b, 4 and 5) and one span detection task (Task 1b) from the Social Media Mining for Health (SMM4H) 2021 shared tasks. We developed two separate systems for classification and span detection, both based on pre-trained Transformer-based models. In addition, we applied oversampling and classifier ensembling in the classification tasks. The results of our submissions are over the median scores in all tasks except for Task 1a. Furthermore, our model achieved first place in Task 4 and obtained a 7% higher F1-score than the median in Task 1b.
2020
pdf
abs
Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets
Yuting Guo
|
Xiangjue Dong
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
|
Cecile Paris
|
Diego Mollá Aliod
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association
Free text data from social media is now widely used in natural language processing research, and one of the most common machine learning tasks performed on this data is classification. Generally speaking, performances of supervised classification algorithms on social media datasets are lower than those on texts from other sources, but recently-proposed transformer-based models have considerably improved upon legacy state-of-the-art systems. Currently, there is no study that compares the performances of different variants of transformer-based models on a wide range of social media text classification datasets. In this paper, we benchmark the performances of transformer-based pre-trained models on 25 social media text classification datasets, 6 of which are health-related. We compare three pre-trained language models, RoBERTa-base, BERTweet and ClinicalBioBERT in terms of classification accuracy. Our experiments show that RoBERTa-base and BERTweet perform comparably on most datasets, and considerably better than ClinicalBioBERT, even on health-related datasets.
pdf
abs
Emory at WNUT-2020 Task 2: Combining Pretrained Deep Learning Models and Feature Enrichment for Informative Tweet Identification
Yuting Guo
|
Mohammed Ali Al-Garadi
|
Abeed Sarker
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
This paper describes the system developed by the Emory team for the WNUT-2020 Task 2: “Identifi- cation of Informative COVID-19 English Tweet”. Our system explores three recent Transformer- based deep learning models pretrained on large- scale data to encode documents. Moreover, we developed two feature enrichment methods to en- hance document embeddings by integrating emoji embeddings and syntactic features into deep learn- ing models. Our system achieved F1-score of 0.897 and accuracy of 90.1% on the test set, and ranked in the top-third of all 55 teams.