2024
pdf
abs
Alleviating Exposure Bias in Abstractive Summarization via Sequentially Generating and Revising
Jiaxin Duan
|
Fengyu Lu
|
Junfei Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstractive summarization commonly suffers from exposure bias caused by supervised teacher-force learning, that a model predicts the next token conditioned on the accurate pre-context during training while on its preceding outputs at inference. Existing solutions bridge this gap through un- or semi-supervised holistic learning yet still leave the risk of error accumulation while generating a summary. In this paper, we attribute this problem to the limitation of unidirectional autoregressive text generation and introduce post-processing steps to alleviate it. Specifically, we reformat abstractive summarization to sequential generation and revision (SeGRe), i.e., a model in the revision phase re-inputs the generated summary and refines it by contrasting it with the source document. This provides the model additional opportunities to assess the flawed summary from a global view and thereby modify inappropriate expressions. Moreover, we train the SeGRe model with a regularized minimum-risk policy to ensure effective generation and revision. A lot of comparative experiments are implemented on two well-known datasets, exhibiting the new or matched state-of-the-art performance of SeGRe.
pdf
abs
Prophecy Distillation for Boosting Abstractive Summarization
Jiaxin Duan
|
Fengyu Lu
|
Junfei Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Abstractive summarization models learned with maximum likelihood estimation (MLE) have long been guilty of generating unfaithful facts alongside ambiguous focus. Improved paradigm under the guidance of reference-identified words, i.e., guided summarization, has exhibited remarkable advantages in overcoming this problem. However, it suffers limited real applications since the prophetic guidance is practically agnostic at inference. In this paper, we introduce a novel teacher-student framework, which learns a regular summarization model to mimic the behavior of being guided by prophecy for boosting abstractive summaries. Specifically, by training in probability spaces to follow and distinguish a guided teacher model, a student model learns the key to generating teacher-like quality summaries without any guidance. We refer to this process as prophecy distillation, and it breaks the limitations of both standard and guided summarization. Through extensive experiments, we show that our method achieves new or matched state-of-the-art on four well-known datasets, including ROUGE scores, faithfulness, and saliency awareness. Human evaluations are also carried out to evidence these merits. Furthermore, we conduct empirical studies to analyze how the hyperparameters setting and the guidance choice affect TPG performance.
2021
pdf
abs
Combining Curriculum Learning and Knowledge Distillation for Dialogue Generation
Qingqing Zhu
|
Xiuying Chen
|
Pengfei Wu
|
JunFei Liu
|
Dongyan Zhao
Findings of the Association for Computational Linguistics: EMNLP 2021
Curriculum learning, a machine training strategy that feeds training instances to the model from easy to hard, has been proven to facilitate the dialogue generation task. Meanwhile, knowledge distillation, a knowledge transformation methodology among teachers and students networks can yield significant performance boost for student models. Hence, in this paper, we introduce a combination of curriculum learning and knowledge distillation for efficient dialogue generation models, where curriculum learning can help knowledge distillation from data and model aspects. To start with, from the data aspect, we cluster the training cases according to their complexity, which is calculated by various types of features such as sentence length and coherence between dialog pairs. Furthermore, we employ an adversarial training strategy to identify the complexity of cases from model level. The intuition is that, if a discriminator can tell the generated response is from the teacher or the student, then the case is difficult that the student model has not adapted to yet. Finally, we use self-paced learning, which is an extension to curriculum learning to assign weights for distillation. In conclusion, we arrange a hierarchical curriculum based on the above two aspects for the student model under the guidance from the teacher model. Experimental results demonstrate that our methods achieve improvements compared with competitive baselines.
pdf
abs
MedAI at SemEval-2021 Task 5: Start-to-end Tagging Framework for Toxic Spans Detection
Zhen Wang
|
Hongjie Fan
|
Junfei Liu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
This paper describes the system submitted to SemEval 2021 Task 5: Toxic Spans Detection. The task concerns evaluating systems that detect the spans that make a text toxic when detecting such spans are possible. To address the possibly multi-span detection problem, we develop a start-to-end tagging framework on top of RoBERTa based language model. Besides, we design a custom loss function that takes distance into account. In comparison to other participating teams, our system has achieved 69.03% F1 score, which is slightly lower (-1.8 and -1.73) than the top 1(70.83%) and top 2 (70.77%), respectively.
2020
pdf
abs
Learn with Noisy Data via Unsupervised Loss Correction for Weakly Supervised Reading Comprehension
Xuemiao Zhang
|
Kun Zhou
|
Sirui Wang
|
Fuzheng Zhang
|
Zhongyuan Wang
|
Junfei Liu
Proceedings of the 28th International Conference on Computational Linguistics
Weakly supervised machine reading comprehension (MRC) task is practical and promising for its easily available and massive training data, but inevitablely introduces noise. Existing related methods usually incorporate extra submodels to help filter noise before the noisy data is input to main models. However, these multistage methods often make training difficult, and the qualities of submodels are hard to be controlled. In this paper, we first explore and analyze the essential characteristics of noise from the perspective of loss distribution, and find that in the early stage of training, noisy samples usually lead to significantly larger loss values than clean ones. Based on the observation, we propose a hierarchical loss correction strategy to avoid fitting noise and enhance clean supervision signals, including using an unsupervisedly fitted Gaussian mixture model to calculate the weight factors for all losses to correct the loss distribution, and employ a hard bootstrapping loss to modify loss function. Experimental results on different weakly supervised MRC datasets show that the proposed methods can help improve models significantly.