IceSum: An Icelandic Text Summarization Corpus
Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, Þorsteinn Björnsson
Abstract
Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 online news articles and their extractive summaries. We train and evaluate several neural network-based models on this dataset, comparing them against a selection of baseline methods. We find that an encoder-decoder model with a sequence-to-sequence based extractor obtains the best results, outperforming all baseline methods. Furthermore, we evaluate how the size of the training corpus affects the quality of the generated summaries. We release the corpus and the models with an open license.- Anthology ID:
- 2021.naacl-srw.2
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9–14
- Language:
- URL:
- https://aclanthology.org/2021.naacl-srw.2
- DOI:
- 10.18653/v1/2021.naacl-srw.2
- Cite (ACL):
- Jón Daðason, Hrafn Loftsson, Salome Sigurðardóttir, and Þorsteinn Björnsson. 2021. IceSum: An Icelandic Text Summarization Corpus. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 9–14, Online. Association for Computational Linguistics.
- Cite (Informal):
- IceSum: An Icelandic Text Summarization Corpus (Daðason et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2021.naacl-srw.2.pdf