Abstract
This paper describes the winning model in the Arabic NLP4IF shared task for fighting the COVID-19 infodemic. The goal of the shared task is to check disinformation about COVID-19 in Arabic tweets. Our proposed model has been ranked 1st with an F1-Score of 0.780 and an Accuracy score of 0.762. A variety of transformer-based pre-trained language models have been experimented with through this study. The best-scored model is an ensemble of AraBERT-Base, Asafya-BERT, and ARBERT models. One of the study’s key findings is showing the effect the pre-processing can have on every model’s score. In addition to describing the winning model, the current study shows the error analysis.- Anthology ID:
- 2021.nlp4if-1.15
- Volume:
- Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- NLP4IF
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 104–109
- Language:
- URL:
- https://aclanthology.org/2021.nlp4if-1.15
- DOI:
- 10.18653/v1/2021.nlp4if-1.15
- Cite (ACL):
- Ahmed Qarqaz, Dia Abujaber, and Malak Abdullah. 2021. R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 104–109, Online. Association for Computational Linguistics.
- Cite (Informal):
- R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers (Qarqaz et al., NLP4IF 2021)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2021.nlp4if-1.15.pdf
- Data
- ArCOV-19