Targeted Syntactic Evaluation of Language Models

Rebecca Marvin, Tal Linzen


Abstract
We present a data set for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM’s accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.
Anthology ID:
D18-1151
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1192–1202
Language:
URL:
https://aclanthology.org/D18-1151
DOI:
10.18653/v1/D18-1151
Bibkey:
Cite (ACL):
Rebecca Marvin and Tal Linzen. 2018. Targeted Syntactic Evaluation of Language Models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1192–1202, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Targeted Syntactic Evaluation of Language Models (Marvin & Linzen, EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D18-1151.pdf
Attachment:
 D18-1151.Attachment.pdf
Video:
 https://vimeo.com/305208737
Code
 BeckyMarvin/LM_syneval +  additional community code