Abstract
We propose adversarial methods for increasing the robustness of disease mention detection on social media. Our method applies adversarial data augmentation on the input and the embedding spaces to the English BioBERT model. We evaluate our method in the SocialDisNER challenge at SMM4H’22 on an annotated dataset of disease mentions in Spanish tweets. We find that both methods outperform a heuristic vocabulary-based baseline by a large margin. Additionally, utilizing the English BioBERT model shows a strong performance and outperforms the data augmentation methods even when applied to the Spanish dataset, which has a large amount of data, while augmentation methods show a significant advantage in a low-data setting.- Anthology ID:
- 2022.smm4h-1.45
- Volume:
- Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Graciela Gonzalez-Hernandez, Davy Weissenbacher
- Venue:
- SMM4H
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 168–170
- Language:
- URL:
- https://aclanthology.org/2022.smm4h-1.45
- DOI:
- Cite (ACL):
- Akbar Karimi and Lucie Flek. 2022. CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 168–170, Gyeongju, Republic of Korea. Association for Computational Linguistics.
- Cite (Informal):
- CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods (Karimi & Flek, SMM4H 2022)
- PDF:
- https://preview.aclanthology.org/landing_page/2022.smm4h-1.45.pdf