Abstract
Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document. While most previous work has released the datasets of keyphrases and summarization separately, in this work, we introduce LipKey, the largest news corpus with human-written abstractive summaries, absent keyphrases, and titles. We jointly use the three elements via multi-task training and training as joint structured inputs, in the context of document summarization. We find that including absent keyphrases and titles as additional context to the source document improves transformer-based summarization models.- Anthology ID:
- 2022.coling-1.303
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3427–3437
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.303
- DOI:
- Cite (ACL):
- Fajri Koto, Timothy Baldwin, and Jey Han Lau. 2022. LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3427–3437, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization (Koto et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.303.pdf
- Data
- IndoSum, KPTimes, Liputan6