Abstract
Globalization has caused the rise of the code-switching phenomenon among multilingual societies. In Arab countries, code-switching between Arabic and English has become frequent, especially through social media platforms. Consequently, research in Natural Language Processing (NLP) systems increased to tackle such a phenomenon. One of the significant challenges of developing code-switched NLP systems is the lack of data itself. In this paper, we propose an open source trained bilingual contextual word embedding models of FLAIR, BERT, and ELECTRA. We also propose a novel contextual word embedding model called KERMIT, which can efficiently map Arabic and English words inside one vector space in terms of data usage. We applied intrinsic and extrinsic evaluation methods to compare the performance of the models. Our results show that FLAIR and FastText achieve the highest results in the sentiment analysis task. However, KERMIT is the best-achieving model on the intrinsic evaluation and named entity recognition. Also, it outperforms the other transformer-based models on question answering task.- Anthology ID:
- 2020.wanlp-1.20
- Volume:
- Proceedings of the Fifth Arabic Natural Language Processing Workshop
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 215–225
- Language:
- URL:
- https://aclanthology.org/2020.wanlp-1.20
- DOI:
- Cite (ACL):
- Caroline Sabty, Mohamed Islam, and Slim Abdennadher. 2020. Contextual Embeddings for Arabic-English Code-Switched Data. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 215–225, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- Contextual Embeddings for Arabic-English Code-Switched Data (Sabty et al., WANLP 2020)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2020.wanlp-1.20.pdf
- Code
- csabty/code-switch-arabic-english-contextual-embeddings