Abstract
We propose a language-independent data-driven method to exhaustively extract bursty phrases of arbitrary forms (e.g., phrases other than simple noun phrases) from microblogs. The burst (i.e., the rapid increase of the occurrence) of a phrase causes the burst of overlapping N-grams including incomplete ones. In other words, bursty incomplete N-grams inevitably overlap bursty phrases. Thus, the proposed method performs the extraction of bursty phrases as the set cover problem in which all bursty N-grams are covered by a minimum set of bursty phrases. Experimental results using Japanese Twitter data showed that the proposed method outperformed word-based, noun phrase-based, and segmentation-based methods both in terms of accuracy and coverage.- Anthology ID:
- D17-1251
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Martha Palmer, Rebecca Hwa, Sebastian Riedel
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2358–2367
- Language:
- URL:
- https://aclanthology.org/D17-1251
- DOI:
- 10.18653/v1/D17-1251
- Cite (ACL):
- Masumi Shirakawa, Takahiro Hara, and Takuya Maekawa. 2017. Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2358–2367, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem (Shirakawa et al., EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/naacl24-info/D17-1251.pdf