Abstract
Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (disaster relief organizations and news agencies) and therefore recognizing the informativeness of a tweet can help filter noise from large volumes of data. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves a F1 score of 0.9011 on the test set (ranking 7th on the leaderboard), and shows significant gains in performance compared to a baseline system using fasttext embeddings.- Anthology ID:
- 2020.wnut-1.67
- Original:
- 2020.wnut-1.67v1
- Version 2:
- 2020.wnut-1.67v2
- Volume:
- Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 455–461
- Language:
- URL:
- https://aclanthology.org/2020.wnut-1.67
- DOI:
- 10.18653/v1/2020.wnut-1.67
- Cite (ACL):
- Nickil Maveli. 2020. EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 455–461, Online. Association for Computational Linguistics.
- Cite (Informal):
- EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets (Maveli, WNUT 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.wnut-1.67.pdf
- Data
- WNUT-2020 Task 2