Sparsifying Transformer Models with Trainable Representation Pooling

Michał Pietruszka; Łukasz Borchmann; Łukasz Garncarek

doi:10.18653/v1/2022.acl-long.590

Sparsifying Transformer Models with Trainable Representation Pooling

Michał Pietruszka, Łukasz Borchmann, Łukasz Garncarek

Abstract

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on the task-specific parts of an input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-k operator.Our experiments on a challenging long document summarization task show that even our simple baseline performs comparably to the current SOTA, and with trainable pooling we can retain its top quality, while being 1.8× faster during training, 4.5× faster during inference, and up to 13× more computationally efficient in the decoder.

Anthology ID:: 2022.acl-long.590
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8616–8633
Language:
URL:: https://aclanthology.org/2022.acl-long.590
DOI:: 10.18653/v1/2022.acl-long.590
Bibkey:
Cite (ACL):: Michał Pietruszka, Łukasz Borchmann, and Łukasz Garncarek. 2022. Sparsifying Transformer Models with Trainable Representation Pooling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8616–8633, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Sparsifying Transformer Models with Trainable Representation Pooling (Pietruszka et al., ACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nodalida-main-page/2022.acl-long.590.pdf
Software:: 2022.acl-long.590.software.zip
Video:: https://preview.aclanthology.org/nodalida-main-page/2022.acl-long.590.mp4
Code: applicaai/pyramidions
Data: Pubmed, arXiv Summarization Dataset

PDF Search Code Software Video