CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

Calum Perrio; Harish Tayyar Madabushi

doi:10.18653/v1/2020.wnut-1.48

CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

Abstract

This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text. We explore improving the performance of a pre-trained transformer-based language model fine-tuned for text classification through an ensemble implementation that makes use of corpus level information and a handcrafted feature. We test the effectiveness of including the aforementioned features in accommodating the challenges of a noisy data set centred on a specific subject outside the remit of the pre-training data. We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.

Anthology ID:: 2020.wnut-1.48
Volume:: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:: November
Year:: 2020
Address:: Online
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 352–358
Language:
URL:: https://aclanthology.org/2020.wnut-1.48
DOI:: 10.18653/v1/2020.wnut-1.48
Bibkey:
Cite (ACL):: Calum Perrio and Harish Tayyar Madabushi. 2020. CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 352–358, Online. Association for Computational Linguistics.
Cite (Informal):: CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features (Perrio & Tayyar Madabushi, WNUT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/remove-xml-comments/2020.wnut-1.48.pdf

PDF Search