Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation

Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, Peter Ramadge


Abstract
Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.
Anthology ID:
2023.findings-emnlp.397
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5972–5984
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.397
DOI:
10.18653/v1/2023.findings-emnlp.397
Bibkey:
Cite (ACL):
Ta-Chung Chi, Ting-Han Fan, Alexander Rudnicky, and Peter Ramadge. 2023. Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5972–5984, Singapore. Association for Computational Linguistics.
Cite (Informal):
Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation (Chi et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.397.pdf