HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors

Abeed Sarker; Graciela Gonzalez

doi:10.18653/v1/S17-2105

HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors

Abstract

We present a simple supervised text classification system that combines sparse and dense vector representations of words, and generalized representations of words via clusters. The sparse vectors are generated from word n-gram sequences (1-3). The dense vector representations of words (embeddings) are learned by training a neural network to predict neighboring words in a large unlabeled dataset. To classify a text segment, the different representations of it are concatenated, and the classification is performed using Support Vector Machines (SVM). Our system is particularly intended for use by non-experts of natural language processing and machine learning, and, therefore, the system does not require any manual tuning of parameters or weights. Given a training set, the system automatically generates the training vectors, optimizes the relevant hyper-parameters for the SVM classifier, and trains the classification model. We evaluated this system on the SemEval-2017 English sentiment analysis task. In terms of average F1-score, our system obtained 8th position out of 39 submissions (F1-score: 0.632, average recall: 0.637, accuracy: 0.646).

Anthology ID:: S17-2105
Volume:: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Month:: August
Year:: 2017
Address:: Vancouver, Canada
Editors:: Steven Bethard, Marine Carpuat, Marianna Apidianaki, Saif M. Mohammad, Daniel Cer, David Jurgens
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 640–643
Language:
URL:: https://aclanthology.org/S17-2105
DOI:: 10.18653/v1/S17-2105
Bibkey:
Cite (ACL):: Abeed Sarker and Graciela Gonzalez. 2017. HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 640–643, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):: HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors (Sarker & Gonzalez, SemEval 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/proper-vol2-ingestion/S17-2105.pdf

PDF Search