Suyuchen Wang
2025
FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
Jinlin Wang
|
Suyuchen Wang
|
Ziwen Xia
|
Sirui Hong
|
Yun Zhu
|
Bang Liu
|
Chenglin Wu
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) are proficient at retrieving single facts from extended contexts, yet they struggle with tasks requiring the simultaneous retrieval of multiple facts, especially during generation. This paper identifies a novel “lost-in-the-middle” phenomenon, where LLMs progressively lose track of critical information throughout the generation process, resulting in incomplete or inaccurate retrieval. To address this challenge, we introduce Find All Crucial Texts (FACT), an iterative retrieval method that refines context through successive rounds of rewriting. This approach enables models to capture essential facts incrementally, which are often overlooked in single-pass retrieval. Experiments demonstrate that FACT substantially enhances multi-fact retrieval performance across various tasks, though improvements are less notable in general-purpose QA scenarios. Our findings shed light on the limitations of LLMs in multi-fact retrieval and underscore the need for more resilient long-context retrieval strategies.
2024
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Suyuchen Wang
|
Ivan Kobyzev
|
Peng Lu
|
Mehdi Rezagholizadeh
|
Bang Liu
Findings of the Association for Computational Linguistics: ACL 2024
This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences. We introduce Resonance RoPE, a novel approach designed to narrow the generalization gap in TSTL scenarios by refining the interpolation of RoPE features for OOD positions, significantly improving the model performance without additional online computational costs. Furthermore, we present PosGen, a new synthetic benchmark specifically designed for fine-grained behavior analysis in TSTL scenarios, aiming to isolate the constantly increasing difficulty of token generation on long contexts from the challenges of recognizing new token positions. Our experiments on synthetic tasks show that after applying Resonance RoPE, Transformers recognize OOD position better and more robustly. Our extensive LLM experiments also show superior performance after applying Resonance RoPE to the current state-of-the-art RoPE scaling method, YaRN, on both upstream language modeling tasks and a variety of downstream long-text applications.
2023
Efficient Classification of Long Documents via State-Space Models
Peng Lu
|
Suyuchen Wang
|
Mehdi Rezagholizadeh
|
Bang Liu
|
Ivan Kobyzev
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Transformer-based models have achieved state-of-the-art performance on numerous NLP applications. However, long documents which are prevalent in real-world scenarios cannot be efficiently processed by transformers with the vanilla self-attention module due to their quadratic computation complexity and limited length extrapolation ability. Instead of tackling the computation difficulty for self-attention with sparse or hierarchical structures, in this paper, we investigate the use of State-Space Models (SSMs) for long document classification tasks. We conducted extensive experiments on six long document classification datasets, including binary, multi-class, and multi-label classification, comparing SSMs (with and without pre-training) to self-attention-based models. We also introduce the SSM-pooler model and demonstrate that it achieves comparable performance while being on average 36% more efficient. Additionally our method exhibits higher robustness to the input noise even in the extreme scenario of 40%.
Search
Fix data
Co-authors
- Bang Liu 3
- Ivan Kobyzev 2
- Peng Lu 2
- Mehdi Rezagholizadeh 2
- Sirui Hong 1
- show all...