DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT

Ahmad Hussein; Nada Ghneim; Ammar Joukhadar

doi:10.18653/v1/2021.nlp4if-1.13

DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT

Ahmad Hussein, Nada Ghneim, Ammar Joukhadar

Abstract

The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-processing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.

Anthology ID:: 2021.nlp4if-1.13
Volume:: Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Month:: June
Year:: 2021
Address:: Online
Editors:: Anna Feldman, Giovanni Da San Martino, Chris Leberknight, Preslav Nakov
Venue:: NLP4IF
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 93–98
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.nlp4if-1.13/
DOI:: 10.18653/v1/2021.nlp4if-1.13
Bibkey:
Cite (ACL):: Ahmad Hussein, Nada Ghneim, and Ammar Joukhadar. 2021. DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT. In Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 93–98, Online. Association for Computational Linguistics.
Cite (Informal):: DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT (Hussein et al., NLP4IF 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2021.nlp4if-1.13.pdf

PDF Cite Search Fix data