HashCount at SemEval-2018 Task 3: Concatenative Featurization of Tweet and Hashtags for Irony Detection

Won Ik Cho, Woo Hyun Kang, Nam Soo Kim


Abstract
This paper proposes a novel feature extraction process for SemEval task 3: Irony detection in English tweets. The proposed system incorporates a concatenative featurization of tweet and hashtags, which helps distinguishing between the irony-related and the other components. The system embeds tweets into a vector sequence with widely used pretrained word vectors, partially using a character embedding for the words that are out of vocabulary. Identification was performed with BiLSTM and CNN classifiers, achieving F1 score of 0.5939 (23/42) and 0.3925 (10/28) each for the binary and the multi-class case, respectively. The reliability of the proposed scheme was verified by analyzing the Gold test data, which demonstrates how hashtags can be taken into account when identifying various types of irony.
Anthology ID:
S18-1089
Volume:
Proceedings of The 12th International Workshop on Semantic Evaluation
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
546–552
Language:
URL:
https://aclanthology.org/S18-1089
DOI:
10.18653/v1/S18-1089
Bibkey:
Cite (ACL):
Won Ik Cho, Woo Hyun Kang, and Nam Soo Kim. 2018. HashCount at SemEval-2018 Task 3: Concatenative Featurization of Tweet and Hashtags for Irony Detection. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 546–552, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
HashCount at SemEval-2018 Task 3: Concatenative Featurization of Tweet and Hashtags for Irony Detection (Cho et al., SemEval 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/S18-1089.pdf