Abstract
Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.- Anthology ID:
- P19-1242
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2538–2549
- Language:
- URL:
- https://aclanthology.org/P19-1242
- DOI:
- 10.18653/v1/P19-1242
- Cite (ACL):
- Mounica Maddela, Wei Xu, and Daniel Preoţiuc-Pietro. 2019. Multi-task Pairwise Neural Ranking for Hashtag Segmentation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2538–2549, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Multi-task Pairwise Neural Ranking for Hashtag Segmentation (Maddela et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/P19-1242.pdf
- Code
- mounicam/hashtag_master