Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data
Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger, Hinrich Schütze
Abstract
Interleaved texts, where posts belonging to different threads occur in a sequence, commonly occur in online chat posts, so that it can be time-consuming to quickly obtain an overview of the discussions. Existing systems first disentangle the posts by threads and then extract summaries from those threads. A major issue with such systems is error propagation from the disentanglement component. While end-to-end trainable summarization system could obviate explicit disentanglement, such systems require a large amount of labeled data. To address this, we propose to pretrain an end-to-end trainable hierarchical encoder-decoder system using synthetic interleaved texts. We show that by fine-tuning on a real-world meeting dataset (AMI), such a system out-performs a traditional two-step system by 22%. We also compare against transformer models and observed that pretraining with synthetic data both the encoder and decoder outperforms the BertSumExtAbs transformer model which pretrains only the encoder on a large dataset.- Anthology ID:
- 2021.adaptnlp-1.24
- Volume:
- Proceedings of the Second Workshop on Domain Adaptation for NLP
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv, Ukraine
- Editors:
- Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, Yftah Ziser
- Venue:
- AdaptNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 245–254
- Language:
- URL:
- https://aclanthology.org/2021.adaptnlp-1.24
- DOI:
- Cite (ACL):
- Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger, and Hinrich Schütze. 2021. Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 245–254, Kyiv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data (Karn et al., AdaptNLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.adaptnlp-1.24.pdf