Manchester Metropolitan at SemEval-2018 Task 2: Random Forest with an Ensemble of Features for Predicting Emoji in Tweets

Luciano Gerber, Matthew Shardlow


Abstract
We present our submission to the Semeval 2018 task on emoji prediction. We used a random forest, with an ensemble of bag-of-words, sentiment and psycholinguistic features. Although we performed well on the trial dataset (attaining a macro f-score of 63.185 for English and 81.381 for Spanish), our approach did not perform as well on the test data. We describe our features and classi cation protocol, as well as initial experiments, concluding with a discussion of the discrepancy between our trial and test results.
Anthology ID:
S18-1079
Volume:
Proceedings of the 12th International Workshop on Semantic Evaluation
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marianna Apidianaki, Saif M. Mohammad, Jonathan May, Ekaterina Shutova, Steven Bethard, Marine Carpuat
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
491–496
Language:
URL:
https://aclanthology.org/S18-1079
DOI:
10.18653/v1/S18-1079
Bibkey:
Cite (ACL):
Luciano Gerber and Matthew Shardlow. 2018. Manchester Metropolitan at SemEval-2018 Task 2: Random Forest with an Ensemble of Features for Predicting Emoji in Tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 491–496, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Manchester Metropolitan at SemEval-2018 Task 2: Random Forest with an Ensemble of Features for Predicting Emoji in Tweets (Gerber & Shardlow, SemEval 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/S18-1079.pdf