Abstract
We present in this paper our work on Algerian language, an under-resourced North African colloquial Arabic variety, for which we built a comparably large corpus of more than 36,000 code-switched user-generated comments annotated for sentiments. We opted for this data domain because Algerian is a colloquial language with no existing freely available corpora. Moreover, we compiled sentiment lexicons of positive and negative unigrams and bigrams reflecting the code-switches present in the language. We compare the performance of four models on the task of identifying sentiments, and the results indicate that a CNN model trained end-to-end fits better our unedited code-switched and unbalanced data across the predefined sentiment classes. Additionally, injecting the lexicons as background knowledge to the model boosts its performance on the minority class with a gain of 10.54 points on the F-score. The results of our experiments can be used as a baseline for future research for Algerian sentiment analysis.- Anthology ID:
- 2020.lrec-1.328
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 2698–2705
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.328
- DOI:
- Cite (ACL):
- Wafia Adouane, Samia Touileb, and Jean-Philippe Bernardy. 2020. Identifying Sentiments in Algerian Code-switched User-generated Comments. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2698–2705, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Identifying Sentiments in Algerian Code-switched User-generated Comments (Adouane et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.328.pdf