Abstract
Work using artificial languages as training input has shown that LSTMs are capable of inducing the stack-like data structures required to represent context-free and certain mildly context-sensitive languages — formal language classes which correspond in theory to the hierarchical structures of natural language. Here we present a suite of experiments probing whether neural language models trained on linguistic data induce these stack-like data structures and deploy them while incrementally predicting words. We study two natural language phenomena: center embedding sentences and syntactic island constraints on the filler–gap dependency. In order to properly predict words in these structures, a model must be able to temporarily suppress certain expectations and then recover those expectations later, essentially pushing and popping these expectations on a stack. Our results provide evidence that models can successfully suppress and recover expectations in many cases, but do not fully recover their previous grammatical state.- Anthology ID:
- W19-4819
- Volume:
- Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Tal Linzen, Grzegorz Chrupała, Yonatan Belinkov, Dieuwke Hupkes
- Venue:
- BlackboxNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 181–190
- Language:
- URL:
- https://aclanthology.org/W19-4819
- DOI:
- 10.18653/v1/W19-4819
- Cite (ACL):
- Ethan Wilcox, Roger Levy, and Richard Futrell. 2019. Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 181–190, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Hierarchical Representation in Neural Language Models: Suppression and Recovery of Expectations (Wilcox et al., BlackboxNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/W19-4819.pdf
- Data
- Billion Word Benchmark