Abstract
Despite the fact that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test the Transformer’s ability to learn mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.- Anthology ID:
- 2023.blackboxnlp-1.21
- Volume:
- Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, Hosein Mohebbi
- Venues:
- BlackboxNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 271–283
- Language:
- URL:
- https://aclanthology.org/2023.blackboxnlp-1.21
- DOI:
- 10.18653/v1/2023.blackboxnlp-1.21
- Cite (ACL):
- Shunjie Wang and Shane Steinert-Threlkeld. 2023. Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages. In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 271–283, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages (Wang & Steinert-Threlkeld, BlackboxNLP-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.blackboxnlp-1.21.pdf