Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments

Salima Medhaffar, Fethi Bougares, Yannick Estève, Lamia Hadrich-Belguith


Abstract
Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.
Anthology ID:
W17-1307
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–61
Language:
URL:
https://aclanthology.org/W17-1307
DOI:
10.18653/v1/W17-1307
Bibkey:
Cite (ACL):
Salima Medhaffar, Fethi Bougares, Yannick Estève, and Lamia Hadrich-Belguith. 2017. Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 55–61, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments (Medhaffar et al., WANLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/W17-1307.pdf
Code
 fbougares/TSAC
Data
TSACLABR