Comparison of Short-Text Sentiment Analysis Methods for Croatian

Leon Rotim, Jan Šnajder


Abstract
We focus on the task of supervised sentiment classification of short and informal texts in Croatian, using two simple yet effective methods: word embeddings and string kernels. We investigate whether word embeddings offer any advantage over corpus- and preprocessing-free string kernels, and how these compare to bag-of-words baselines. We conduct a comparison on three different datasets, using different preprocessing methods and kernel functions. Results show that, on two out of three datasets, word embeddings outperform string kernels, which in turn outperform word and n-gram bag-of-words baselines.
Anthology ID:
W17-1411
Volume:
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:
BSNLP
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
69–75
Language:
URL:
https://aclanthology.org/W17-1411
DOI:
10.18653/v1/W17-1411
Bibkey:
Cite (ACL):
Leon Rotim and Jan Šnajder. 2017. Comparison of Short-Text Sentiment Analysis Methods for Croatian. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 69–75, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Comparison of Short-Text Sentiment Analysis Methods for Croatian (Rotim & Šnajder, BSNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/W17-1411.pdf