Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation

Vasudev Awatramani, Anupam Kumar


Abstract
Since the outbreak of COVID-19, there has been a surge of digital content on social media. The content ranges from news articles, academic reports, tweets, videos, and even memes. Among such an overabundance of data, it is crucial to distinguish which information is actually informative or merely sensational, redundant or false. This work focuses on developing such a language system that can differentiate between Informative or Uninformative tweets associated with COVID-19 for WNUT-2020 Shared Task 2. For this purpose, we employ deep transfer learning models such as BERT along other techniques such as Noisy Data Augmentation and Progress Training. The approach achieves a competitive F1-score of 0.8715 on the final testing dataset.
Anthology ID:
2020.wnut-1.59
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
414–418
Language:
URL:
https://aclanthology.org/2020.wnut-1.59
DOI:
10.18653/v1/2020.wnut-1.59
Bibkey:
Cite (ACL):
Vasudev Awatramani and Anupam Kumar. 2020. Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 414–418, Online. Association for Computational Linguistics.
Cite (Informal):
Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation (Awatramani & Kumar, WNUT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.wnut-1.59.pdf
Data
WNUT-2020 Task 2