Sentiment Analysis of Tweets in Three Indian Languages

Shanta Phani, Shibamouli Lahiri, Arindam Biswas


Abstract
In this paper, we describe the results of sentiment analysis on tweets in three Indian languages – Bengali, Hindi, and Tamil. We used the recently released SAIL dataset (Patra et al., 2015), and obtained state-of-the-art results in all three languages. Our features are simple, robust, scalable, and language-independent. Further, we show that these simple features provide better results than more complex and language-specific features, in two separate classification tasks. Detailed feature analysis and error analysis have been reported, along with learning curves for Hindi and Bengali.
Anthology ID:
W16-3710
Volume:
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
WSSANLP
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
93–102
Language:
URL:
https://aclanthology.org/W16-3710
DOI:
Bibkey:
Cite (ACL):
Shanta Phani, Shibamouli Lahiri, and Arindam Biswas. 2016. Sentiment Analysis of Tweets in Three Indian Languages. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), pages 93–102, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Sentiment Analysis of Tweets in Three Indian Languages (Phani et al., WSSANLP 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W16-3710.pdf