CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods

Akbar Karimi, Lucie Flek


Abstract
We propose adversarial methods for increasing the robustness of disease mention detection on social media. Our method applies adversarial data augmentation on the input and the embedding spaces to the English BioBERT model. We evaluate our method in the SocialDisNER challenge at SMM4H’22 on an annotated dataset of disease mentions in Spanish tweets. We find that both methods outperform a heuristic vocabulary-based baseline by a large margin. Additionally, utilizing the English BioBERT model shows a strong performance and outperforms the data augmentation methods even when applied to the Spanish dataset, which has a large amount of data, while augmentation methods show a significant advantage in a low-data setting.
Anthology ID:
2022.smm4h-1.45
Volume:
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Graciela Gonzalez-Hernandez, Davy Weissenbacher
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
168–170
Language:
URL:
https://aclanthology.org/2022.smm4h-1.45
DOI:
Bibkey:
Cite (ACL):
Akbar Karimi and Lucie Flek. 2022. CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 168–170, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods (Karimi & Flek, SMM4H 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2022.smm4h-1.45.pdf