Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem

Masumi Shirakawa, Takahiro Hara, Takuya Maekawa


Abstract
We propose a language-independent data-driven method to exhaustively extract bursty phrases of arbitrary forms (e.g., phrases other than simple noun phrases) from microblogs. The burst (i.e., the rapid increase of the occurrence) of a phrase causes the burst of overlapping N-grams including incomplete ones. In other words, bursty incomplete N-grams inevitably overlap bursty phrases. Thus, the proposed method performs the extraction of bursty phrases as the set cover problem in which all bursty N-grams are covered by a minimum set of bursty phrases. Experimental results using Japanese Twitter data showed that the proposed method outperformed word-based, noun phrase-based, and segmentation-based methods both in terms of accuracy and coverage.
Anthology ID:
D17-1251
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2358–2367
Language:
URL:
https://aclanthology.org/D17-1251
DOI:
10.18653/v1/D17-1251
Bibkey:
Cite (ACL):
Masumi Shirakawa, Takahiro Hara, and Takuya Maekawa. 2017. Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2358–2367, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Never Abandon Minorities: Exhaustive Extraction of Bursty Phrases on Microblogs Using Set Cover Problem (Shirakawa et al., EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/D17-1251.pdf