UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification

Paul Trust, Provia Kadusabe, Ahmed Zahran, Rosane Minghim, Kizito Omala


Abstract
The paper describes our submissions for the Social Media Mining for Health (SMM4H) workshop 2022 shared tasks. We participated in 2 tasks: (1) classification of adverse drug events (ADE) mentions in english tweets (Task-1a) and (2) classification of self-reported intimate partner violence (IPV) on twitter (Task 7). We proposed an approach that uses RoBERTa (A Robustly Optimized BERT Pretraining Approach) fine-tuned with a label distribution-aware margin loss function and post-hoc posterior calibration for robust inference against class imbalance. We achieved a 4% and 1 % increase in performance on IPV and ADE respectively when compared with the traditional fine-tuning strategy with unweighted cross-entropy loss.
Anthology ID:
2022.smm4h-1.26
Volume:
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–94
Language:
URL:
https://aclanthology.org/2022.smm4h-1.26
DOI:
Bibkey:
Cite (ACL):
Paul Trust, Provia Kadusabe, Ahmed Zahran, Rosane Minghim, and Kizito Omala. 2022. UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 90–94, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification (Trust et al., SMM4H 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2022.smm4h-1.26.pdf