Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set

Radu Tudor Ionescu, Andrei M. Butnaru


Abstract
Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. In this paper, we apply two simple yet effective transductive learning approaches to further improve the results of string kernels. The first approach is based on interpreting the pairwise string kernel similarities between samples in the training set and samples in the test set as features. Our second approach is a simple self-training method based on two learning iterations. In the first iteration, a classifier is trained on the training set and tested on the test set, as usual. In the second iteration, a number of test samples (to which the classifier associated higher confidence scores) are added to the training set for another round of training. However, the ground-truth labels of the added test samples are not necessary. Instead, we use the labels predicted by the classifier in the first training iteration. By adapting string kernels to the test set, we report significantly better accuracy rates in English polarity classification and Arabic dialect identification.
Anthology ID:
D18-1135
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1084–1090
Language:
URL:
https://aclanthology.org/D18-1135
DOI:
10.18653/v1/D18-1135
Bibkey:
Cite (ACL):
Radu Tudor Ionescu and Andrei M. Butnaru. 2018. Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1084–1090, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set (Ionescu & Butnaru, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/D18-1135.pdf