NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets

Anders Giovanni Møller, Rob van der Goot, Barbara Plank


Abstract
With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade, the need for monitoring systems to track relevant information on social media is vitally important. This paper describes our submission to the WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. We investigate the effectiveness for a variety of classification models, and found that domain-specific pre-trained BERT models lead to the best performance. On top of this, we attempt a variety of ensembling strategies, but these attempts did not lead to further improvements. Our final best model, the standalone CT-BERT model, proved to be highly competitive, leading to a shared first place in the shared task. Our results emphasize the importance of domain and task-related pre-training.
Anthology ID:
2020.wnut-1.44
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
331–336
Language:
URL:
https://aclanthology.org/2020.wnut-1.44
DOI:
10.18653/v1/2020.wnut-1.44
Bibkey:
Cite (ACL):
Anders Giovanni Møller, Rob van der Goot, and Barbara Plank. 2020. NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 331–336, Online. Association for Computational Linguistics.
Cite (Informal):
NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets (Giovanni Møller et al., WNUT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.wnut-1.44.pdf
Data
WNUT-2020 Task 2