Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

Shauli Ravfogel, Yoav Goldberg, Tal Linzen


Abstract
How do typological properties such as word order and morphological case marking affect the ability of neural sequence models to acquire the syntax of a language? Cross-linguistic comparisons of RNNs’ syntactic performance (e.g., on subject-verb agreement prediction) are complicated by the fact that any two languages differ in multiple typological properties, as well as by differences in training corpus. We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in one or more typological parameters, and generate corpora for those languages based on a parsed English corpus. We report a series of experiments in which RNNs were trained to predict agreement features for verbs in each of those synthetic languages. Among other findings, (1) performance was higher in subject-verb-object order (as in English) than in subject-object-verb order (as in Japanese), suggesting that RNNs have a recency bias; (2) predicting agreement with both subject and object (polypersonal agreement) improves over predicting each separately, suggesting that underlying syntactic knowledge transfers across the two tasks; and (3) overt morphological case makes agreement prediction significantly easier, regardless of word order.
Anthology ID:
N19-1356
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3532–3542
Language:
URL:
https://aclanthology.org/N19-1356
DOI:
10.18653/v1/N19-1356
Bibkey:
Cite (ACL):
Shauli Ravfogel, Yoav Goldberg, and Tal Linzen. 2019. Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3532–3542, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages (Ravfogel et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/N19-1356.pdf
Video:
 https://vimeo.com/347430311
Code
 Shaul1321/rnn_typology +  additional community code
Data
Universal Dependencies