Creating and Evaluating Resources for Sentiment Analysis in the Low-resource Language: Sindhi
Wazir Ali, Naveed Ali, Yong Dai, Jay Kumar, Saifullah Tumrani, Zenglin Xu
Abstract
In this paper, we develop Sindhi subjective lexicon using a merger of existing English resources: NRC lexicon, list of opinion words, SentiWordNet, Sindhi-English bilingual dictionary, and collection of Sindhi modifiers. The positive or negative sentiment score is assigned to each Sindhi opinion word. Afterwards, we determine the coverage of the proposed lexicon with subjectivity analysis. Moreover, we crawl multi-domain tweet corpus of news, sports, and finance. The crawled corpus is annotated by experienced annotators using the Doccano text annotation tool. The sentiment annotated corpus is evaluated by employing support vector machine (SVM), recurrent neural network (RNN) variants, and convolutional neural network (CNN).- Anthology ID:
- 2021.wassa-1.20
- Volume:
- Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Venue:
- WASSA
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 188–194
- Language:
- URL:
- https://aclanthology.org/2021.wassa-1.20
- DOI:
- Cite (ACL):
- Wazir Ali, Naveed Ali, Yong Dai, Jay Kumar, Saifullah Tumrani, and Zenglin Xu. 2021. Creating and Evaluating Resources for Sentiment Analysis in the Low-resource Language: Sindhi. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 188–194, Online. Association for Computational Linguistics.
- Cite (Informal):
- Creating and Evaluating Resources for Sentiment Analysis in the Low-resource Language: Sindhi (Ali et al., WASSA 2021)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2021.wassa-1.20.pdf