Abstract
In this paper, we present our system designed to address the W-NUT 2020 shared task for COVID-19 Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVID-19 related Twitter messages. Our system is trained on the COVID-19 Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVID-19 related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures; our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.- Anthology ID:
- 2020.wnut-1.77
- Volume:
- Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
- Venue:
- WNUT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 505–513
- Language:
- URL:
- https://aclanthology.org/2020.wnut-1.77
- DOI:
- 10.18653/v1/2020.wnut-1.77
- Cite (ACL):
- Xiangyu Yang, Giannis Bekoulis, and Nikos Deligiannis. 2020. imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A multilabel BERT-based system for predicting COVID-19 events. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 505–513, Online. Association for Computational Linguistics.
- Cite (Informal):
- imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A multilabel BERT-based system for predicting COVID-19 events (Yang et al., WNUT 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2020.wnut-1.77.pdf