Fengyu Lu
2024
Alleviating Exposure Bias in Abstractive Summarization via Sequentially Generating and Revising
Jiaxin Duan
|
Fengyu Lu
|
Junfei Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstractive summarization commonly suffers from exposure bias caused by supervised teacher-force learning, that a model predicts the next token conditioned on the accurate pre-context during training while on its preceding outputs at inference. Existing solutions bridge this gap through un- or semi-supervised holistic learning yet still leave the risk of error accumulation while generating a summary. In this paper, we attribute this problem to the limitation of unidirectional autoregressive text generation and introduce post-processing steps to alleviate it. Specifically, we reformat abstractive summarization to sequential generation and revision (SeGRe), i.e., a model in the revision phase re-inputs the generated summary and refines it by contrasting it with the source document. This provides the model additional opportunities to assess the flawed summary from a global view and thereby modify inappropriate expressions. Moreover, we train the SeGRe model with a regularized minimum-risk policy to ensure effective generation and revision. A lot of comparative experiments are implemented on two well-known datasets, exhibiting the new or matched state-of-the-art performance of SeGRe.
Prophecy Distillation for Boosting Abstractive Summarization
Jiaxin Duan
|
Fengyu Lu
|
Junfei Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstractive summarization models learned with maximum likelihood estimation (MLE) have long been guilty of generating unfaithful facts alongside ambiguous focus. Improved paradigm under the guidance of reference-identified words, i.e., guided summarization, has exhibited remarkable advantages in overcoming this problem. However, it suffers limited real applications since the prophetic guidance is practically agnostic at inference. In this paper, we introduce a novel teacher-student framework, which learns a regular summarization model to mimic the behavior of being guided by prophecy for boosting abstractive summaries. Specifically, by training in probability spaces to follow and distinguish a guided teacher model, a student model learns the key to generating teacher-like quality summaries without any guidance. We refer to this process as prophecy distillation, and it breaks the limitations of both standard and guided summarization. Through extensive experiments, we show that our method achieves new or matched state-of-the-art on four well-known datasets, including ROUGE scores, faithfulness, and saliency awareness. Human evaluations are also carried out to evidence these merits. Furthermore, we conduct empirical studies to analyze how the hyperparameters setting and the guidance choice affect TPG performance.
Search