Abstract
With social media becoming popular, a vast of short and noisy messages are produced by millions of users when a hot event happens. Developing social summarization systems becomes more and more critical for people to quickly grasp core and essential information. However, the publicly available and high-quality large scale social summarization dataset is rare. Constructing such corpus is not easy and very expensive since short texts have very complex social characteristics. In this paper, we construct TWEETSUM, a new event-oriented dataset for social summarization. The original data is collected from twitter and contains 12 real world hot events with a total of 44,034 tweets and 11,240 users. Each event has four expert summaries, and we also have the annotation quality evaluation. In addition, we collect additional social signals (i.e. user relations, hashtags and user profiles) and further establish user relation network for each event. Besides the detailed dataset description, we show the performance of several typical extractive summarization methods on TWEETSUM to establish baselines. For further researches, we will release this dataset to the public.- Anthology ID:
- 2020.coling-main.504
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 5731–5736
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.504
- DOI:
- 10.18653/v1/2020.coling-main.504
- Cite (ACL):
- Ruifang He, Liangliang Zhao, and Huanyu Liu. 2020. TWEETSUM: Event oriented Social Summarization Dataset. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5731–5736, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- TWEETSUM: Event oriented Social Summarization Dataset (He et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.504.pdf