Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du


Abstract
Although dominant in natural language processing, transformer-based models still struggle with long-sequence processing, due to the computational costs of their self-attention operations, which increase exponentially as the length of the input sequence grows. To address this challenge, we propose a **Sim**ple framework to enhance the long-content processing of off-the-shelf pre-trained transformers via three steps: **C**hunk, **A**lign, and **S**elect (SimCAS). More specifically, we first divide each long-sequence input into a batch of chunks, then align the inter-chunk information during the encoding steps, and finally, select the most representative hidden states from the encoder for the decoding process. With our SimCAS, the computation and memory costs can be reduced to linear complexity. In experiments, we demonstrate the effectiveness of the proposed method on various real-world long-text summarization and reading comprehension tasks, in which SimCAS significantly outperforms prior long-sequence processing baselines. The code is at [https://github.com/xjw-nlp/SimCAS](https://github.com/xjw-nlp/SimCAS).
Anthology ID:
2024.acl-long.729
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13500–13519
Language:
URL:
https://aclanthology.org/2024.acl-long.729
DOI:
10.18653/v1/2024.acl-long.729
Bibkey:
Cite (ACL):
Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, and Nan Du. 2024. Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13500–13519, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers (Xie et al., ACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2024.acl-long.729.pdf
Video:
 https://preview.aclanthology.org/add_acl24_videos/2024.acl-long.729.mp4