Abstract
In this work, we describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.- Anthology ID:
- 2020.wnut-1.65
- Volume:
- Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 444–449
- Language:
- URL:
- https://aclanthology.org/2020.wnut-1.65
- DOI:
- 10.18653/v1/2020.wnut-1.65
- Cite (ACL):
- Abhilasha Sancheti, Kushal Chawla, and Gaurav Verma. 2020. LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 444–449, Online. Association for Computational Linguistics.
- Cite (Informal):
- LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets (Sancheti et al., WNUT 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.wnut-1.65.pdf