Abstract
We focus on the recognition of Dyck-n (Dn) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA+) and one without (SA-). Our results show that SA+ is able to generalize to longer sequences and deeper dependencies. For D2, we find that SA- completely breaks down on long sequences whereas the accuracy of SA+ is 58.82%. We find attention maps learned by SA+ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.- Anthology ID:
- 2020.findings-emnlp.384
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4301–4306
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.384
- DOI:
- 10.18653/v1/2020.findings-emnlp.384
- Cite (ACL):
- Javid Ebrahimi, Dhruv Gelda, and Wei Zhang. 2020. How Can Self-Attention Networks Recognize Dyck-n Languages?. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4301–4306, Online. Association for Computational Linguistics.
- Cite (Informal):
- How Can Self-Attention Networks Recognize Dyck-n Languages? (Ebrahimi et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2020.findings-emnlp.384.pdf