2024
pdf
abs
Unveiling the Magic: Investigating Attention Distillation in Retrieval-Augmented Generation
Zizhong Li
|
Haopeng Zhang
|
Jiawei Zhang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Retrieval-augmented generation framework addresses the limitations of large language models by enabling real-time knowledge updates for more accurate answers. An efficient way in the training phase of retrieval-augmented models is attention distillation, which uses attention scores as supervision signals instead of manually annotated query-document pairs. Despite its growing popularity, the detailed mechanisms behind the success of attention distillation remain unexplored, particularly the specific patterns it leverages to benefit training. In this paper, we address this gap by conducting a comprehensive investigation of attention distillation workflow and identifying key factors influencing the learning performance of retrieval-augmented language models. We further propose several insightful indicators for optimizing models’ training methods and avoiding ineffective training.
pdf
abs
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Haopeng Zhang
|
Hayate Iso
|
Sairam Gurajada
|
Nikita Bhutani
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Text editing is a crucial task of modifying text to better align with user intents. However, existing text editing benchmark datasets contain only coarse-grained instructions and lack explainability, thus resulting in outputs that deviate from the intended changes outlined in the gold reference. To comprehensively investigate the text editing capabilities of large language models (LLMs), this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU considers finer-grained text editing tasks of varying difficulty (simplification, grammar check, fact-check, etc.), incorporating lexical, syntactic, semantic, and knowledge-intensive edit aspects. To enhance interpretability, we combine LLM-based annotation and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing LLMs against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research at https://github.com/megagonlabs/xatu.
2023
pdf
abs
DiffuSum: Generation Enhanced Extractive Summarization with Diffusion
Haopeng Zhang
|
Xiao Liu
|
Jiawei Zhang
Findings of the Association for Computational Linguistics: ACL 2023
Extractive summarization aims to form a summary by directly extracting sentences from the source document. Existing works mostly formulate it as a sequence labeling problem by making individual sentence label predictions. This paper proposes DiffuSum, a novel paradigm for extractive summarization, by directly generating the desired summary sentence representations with diffusion models and extracting sentences based on sentence representation matching. In addition, DiffuSum jointly optimizes a contrastive sentence encoder with a matching loss for sentence representation alignment and a multi-class contrastive loss for representation diversity. Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of 44.83/22.56/40.56. Experiments on the other two datasets with different summary lengths and cross-dataset evaluation also demonstrate the effectiveness of DiffuSum. The strong performance of our framework shows the great potential of adapting generative models for extractive summarization.
pdf
abs
Extractive Summarization via ChatGPT for Faithful Summary Generation
Haopeng Zhang
|
Xiao Liu
|
Jiawei Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023
Extractive summarization is a crucial task in natural language processing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of large language models has attracted significant interest in the NLP community due to its remarkable performance on a wide range of downstream tasks. This paper first presents a thorough evaluation of ChatGPT’s performance on extractive summarization and compares it with traditional fine-tuning methods on various benchmark datasets. Our experimental analysis reveals that ChatGPT exhibits inferior extractive summarization performance in terms of ROUGE scores compared to existing supervised systems, while achieving higher performance based on LLM-based evaluation metrics. In addition, we explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance. Furthermore, we find that applying an extract-then-generate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT’s capabilities in faithful summarization using two-stage approaches.
pdf
abs
SummIt: Iterative Text Summarization via ChatGPT
Haopeng Zhang
|
Xiao Liu
|
Jiawei Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023
Existing text summarization systems have made significant progress in recent years, but typically generate summaries in a single step. The one-shot summarization setting is sometimes inadequate, however, as the generated summary may contain hallucinations or overlook important details related to the reader’s interests. In this paper, we address this limitation by proposing SummIt, an iterative text summarization framework based on large language models like ChatGPT. Our framework enables the model to refine the generated summary iteratively through self-evaluation and feedback, closely resembling the iterative process humans undertake when drafting and revising summaries. Furthermore, we explore the potential benefits of integrating knowledge and topic extractors into the framework to enhance summary faithfulness and controllability. We evaluate the performance of our framework on three benchmark summarization datasets through empirical and qualitative analyses. We also conduct a human evaluation to validate the effectiveness of the model’s refinements and find a potential issue of over-correction.
pdf
Unsupervised Multi-document Summarization with Holistic Inference
Haopeng Zhang
|
Sangwoo Cho
|
Kaiqiang Song
|
Xiaoyang Wang
|
Hongwei Wang
|
Jiawei Zhang
|
Dong Yu
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)
pdf
abs
Contrastive Hierarchical Discourse Graph for Scientific Document Summarization
Haopeng Zhang
|
Xiao Liu
|
Jiawei Zhang
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)
The extended structural context has made scientific paper summarization a challenging task. This paper proposes CHANGES, a contrastive hierarchical graph neural network for extractive scientific paper summarization. CHANGES represents a scientific paper with a hierarchical discourse graph and learns effective sentence representations with dedicated designed hierarchical graph information aggregation. We also propose a graph contrastive learning module to learn global theme-aware sentence representations. Extensive experiments on the PubMed and arXiv benchmark datasets prove the effectiveness of CHANGES and the importance of capturing hierarchical structure information in modeling scientific papers.
2022
pdf
abs
HEGEL: Hypergraph Transformer for Long Document Summarization
Haopeng Zhang
|
Xiao Liu
|
Jiawei Zhang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Extractive summarization for long documents is challenging due to the extended structured input context. The long-distance sentence dependency hinders cross-sentence relations modeling, the critical step of extractive summarization. This paper proposes HEGEL, a hypergraph neural network for long document summarization by capturing high-order cross-sentence relations. HEGEL updates and learns effective sentence representations with hypergraph transformer layers and fuses different types of sentence dependencies, including latent topics, keywords coreference, and section structure. We validate HEGEL by conducting extensive experiments on two benchmark datasets, and experimental results demonstrate the effectiveness and efficiency of HEGEL.
pdf
abs
Improving the Faithfulness of Abstractive Summarization via Entity Coverage Control
Haopeng Zhang
|
Semih Yavuz
|
Wojciech Kryscinski
|
Kazuma Hashimoto
|
Yingbo Zhou
Findings of the Association for Computational Linguistics: NAACL 2022
Abstractive summarization systems leveraging pre-training language models have achieved superior results on benchmark datasets. However, such models have been shown to be more prone to hallucinate facts that are unfaithful to the input context. In this paper, we propose a method to remedy entity-level extrinsic hallucinations with Entity Coverage Control (ECC). We first compute entity coverage precision and prepend the corresponding control code for each training example, which implicitly guides the model to recognize faithfulness contents in the training phase. We further extend our method via intermediate fine-tuning on large but noisy data extracted from Wikipedia to unlock zero-shot summarization. We show that the proposed method leads to more faithful and salient abstractive summarization in supervised fine-tuning and zero-shot settings according to our experimental results on three benchmark datasets XSum, Pubmed, and SAMSum of very different domains and styles.
2020
pdf
abs
Text Graph Transformer for Document Classification
Haopeng Zhang
|
Jiawei Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Text classification is a fundamental problem in natural language processing. Recent studies applied graph neural network (GNN) techniques to capture global word co-occurrence in a corpus. However, previous works are not scalable to large-sized corpus and ignore the heterogeneity of the text graph. To address these problems, we introduce a novel Transformer based heterogeneous graph neural network, namely Text Graph Transformer (TG-Transformer). Our model learns effective node representations by capturing structure and heterogeneity from the text graph. We propose a mini-batch text graph sampling method that significantly reduces computing and memory costs to handle large-sized corpus. Extensive experiments have been conducted on several benchmark datasets, and the results demonstrate that TG-Transformer outperforms state-of-the-art approaches on text classification task.