Abstract
The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models. This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection. We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection. Furthermore, we apply our model to prune the self-labeled training data. Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.- Anthology ID:
- P18-2122
- Volume:
- Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 771–777
- Language:
- URL:
- https://aclanthology.org/P18-2122
- DOI:
- 10.18653/v1/P18-2122
- Cite (ACL):
- Hen-Hsen Huang, Chiao-Chen Chen, and Hsin-Hsi Chen. 2018. Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 771–777, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection (Huang et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/P18-2122.pdf