Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis

Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, Peter Ramadge


Abstract
Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences.A relative positional embedding design, ALiBi, has had the widest usage to date. We dissect ALiBi via the lens of receptive field analysis empowered by a novel cumulative normalized gradient tool. The concept of receptive field further allows us to modify the vanilla Sinusoidal positional embedding to create Sandwich, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence. Sandwich shares with KERPLE and T5 the same logarithmic decaying temporal bias pattern with learnable relative positional embeddings; these elucidate future extrapolatable positional embedding design.
Anthology ID:
2023.acl-long.756
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13522–13537
Language:
URL:
https://aclanthology.org/2023.acl-long.756
DOI:
10.18653/v1/2023.acl-long.756
Award:
 Outstanding Paper Award
Bibkey:
Cite (ACL):
Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, and Peter Ramadge. 2023. Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13522–13537, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis (Chi et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.acl-long.756.pdf