Abstract
Sentiment analysis for code-mixed social media text continues to be an under-explored area. This work adds two common approaches: fine-tuning large transformer models and sample efficient methods like ULMFiT. Prior work demonstrates the efficacy of classical ML methods for polarity detection. Fine-tuned general-purpose language representation models, such as those of the BERT family are benchmarked along with classical machine learning and ensemble methods. We show that NB-SVM beats RoBERTa by 6.2% (relative) F1. The best performing model is a majority-vote ensemble which achieves an F1 of 0.707. The leaderboard submission was made under the codalab username nirantk, with F1 of 0.689.- Anthology ID:
- 2020.semeval-1.119
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 934–939
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.119
- DOI:
- 10.18653/v1/2020.semeval-1.119
- Cite (ACL):
- Meghana Bhange and Nirant Kasliwal. 2020. HinglishNLP at SemEval-2020 Task 9: Fine-tuned Language Models for Hinglish Sentiment Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 934–939, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- HinglishNLP at SemEval-2020 Task 9: Fine-tuned Language Models for Hinglish Sentiment Detection (Bhange & Kasliwal, SemEval 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.semeval-1.119.pdf
- Code
- NirantK/Hinglish + additional community code
- Data
- SentiMix