Gold Corpus for Telegraphic Summarization
Chanakya Malireddy, Srivenkata N M Somisetty, Manish Shrivastava
Abstract
Most extractive summarization techniques operate by ranking all the source sentences and then select the top ranked sentences as the summary. Such methods are known to produce good summaries, especially when applied to news articles and scientific texts. However, they don’t fare so well when applied to texts such as fictional narratives, which don’t have a single central or recurrent theme. This is because usually the information or plot of the story is spread across several sentences. In this paper, we discuss a different summarization technique called Telegraphic Summarization. Here, we don’t select whole sentences, rather pick short segments of text spread across sentences, as the summary. We have tailored a set of guidelines to create such summaries and, using the same, annotate a gold corpus of 200 English short stories.- Anthology ID:
- W18-3810
- Volume:
- Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Peter Machonis, Anabela Barreiro, Kristina Kocijan, Max Silberztein
- Venue:
- LR4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 71–77
- Language:
- URL:
- https://aclanthology.org/W18-3810
- DOI:
- Cite (ACL):
- Chanakya Malireddy, Srivenkata N M Somisetty, and Manish Shrivastava. 2018. Gold Corpus for Telegraphic Summarization. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 71–77, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Gold Corpus for Telegraphic Summarization (Malireddy et al., LR4NLP 2018)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/W18-3810.pdf
- Code
- m-chanakya/shortstories
- Data
- Telegraphic Summaries