Abstract
In this paper, we describe the results of sentiment analysis on tweets in three Indian languages – Bengali, Hindi, and Tamil. We used the recently released SAIL dataset (Patra et al., 2015), and obtained state-of-the-art results in all three languages. Our features are simple, robust, scalable, and language-independent. Further, we show that these simple features provide better results than more complex and language-specific features, in two separate classification tasks. Detailed feature analysis and error analysis have been reported, along with learning curves for Hindi and Bengali.- Anthology ID:
- W16-3710
- Volume:
- Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Venue:
- WSSANLP
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 93–102
- Language:
- URL:
- https://aclanthology.org/W16-3710
- DOI:
- Cite (ACL):
- Shanta Phani, Shibamouli Lahiri, and Arindam Biswas. 2016. Sentiment Analysis of Tweets in Three Indian Languages. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), pages 93–102, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Sentiment Analysis of Tweets in Three Indian Languages (Phani et al., WSSANLP 2016)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W16-3710.pdf