SunBear at WNUT-2020 Task 2: Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain

Linh Doan Bao; Viet Anh Nguyen; Quang Pham Huu

doi:10.18653/v1/2020.wnut-1.73

SunBear at WNUT-2020 Task 2: Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain

Linh Doan Bao, Viet Anh Nguyen, Quang Pham Huu

Abstract

This paper proposes an improved custom model for WNUT task 2: Identification of Informative COVID-19 English Tweet. We improve experiment with the effectiveness of fine-tuning methodologies for state-of-the-art language model RoBERTa. We make a preliminary instantiation of this formal model for the text classification approaches. With appropriate training techniques, our model is able to achieve 0.9218 F1-score on public validation set and the ensemble version settles at top 9 F1-score (0.9005) and top 2 Recall (0.9301) on private test set.

Anthology ID:: 2020.wnut-1.73
Volume:: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:: November
Year:: 2020
Address:: Online
Editors:: Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:: WNUT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 485–490
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.wnut-1.73/
DOI:: 10.18653/v1/2020.wnut-1.73
Bibkey:
Cite (ACL):: Linh Doan Bao, Viet Anh Nguyen, and Quang Pham Huu. 2020. SunBear at WNUT-2020 Task 2: Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 485–490, Online. Association for Computational Linguistics.
Cite (Informal):: SunBear at WNUT-2020 Task 2: Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain (Doan Bao et al., WNUT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.wnut-1.73.pdf
Data: WNUT-2020 Task 2

PDF Cite Search Fix data