Leon Bergen

2020

pdf abs
Predicting Reference: What do Language Models Learn about Discourse Models?
Shiva Upadhye | Leon Bergen | Andrew Kehler
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Whereas there is a growing literature that probes neural language models to assess the degree to which they have latently acquired grammatical knowledge, little if any research has investigated their acquisition of discourse modeling ability. We address this question by drawing on a rich psycholinguistic literature that has established how different contexts affect referential biases concerning who is likely to be referred to next. The results reveal that, for the most part, the prediction behavior of neural language models does not resemble that of human language users.

pdf abs
Word Frequency Does Not Predict Grammatical Knowledge in Language Models
Charles Yu | Ryan Sie | Nicolas Tedeschi | Leon Bergen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Neural language models learn, to varying degrees of accuracy, the grammatical properties of natural languages. In this work, we investigate whether there are systematic sources of variation in the language models’ accuracy. Focusing on subject-verb agreement and reflexive anaphora, we find that certain nouns are systematically understood better than others, an effect which is robust across grammatical tasks and different language models. Surprisingly, we find that across four orders of magnitude, corpus frequency is unrelated to a noun’s performance on grammatical tasks. Finally, we find that a novel noun’s grammatical properties can be few-shot learned from various types of training data. The results present a paradox: there should be less variation in grammatical performance than is actually observed.

pdf abs
Speakers enhance contextually confusable words
Eric Meinhardt | Eric Bakovic | Leon Bergen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent work has found evidence that natural languages are shaped by pressures for efficient communication — e.g. the more contextually predictable a word is, the fewer speech sounds or syllables it has (Piantadosi et al. 2011). Research on the degree to which speech and language are shaped by pressures for effective communication — robustness in the face of noise and uncertainty — has been more equivocal. We develop a measure of contextual confusability during word recognition based on psychoacoustic data. Applying this measure to naturalistic speech corpora, we find evidence suggesting that speakers alter their productions to make contextually more confusable words easier to understand.

2019

pdf abs
Constraint-based Learning of Phonological Processes
Shraddha Barke | Rose Kunkel | Nadia Polikarpova | Eric Meinhardt | Eric Bakovic | Leon Bergen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Phonological processes are context-dependent sound changes in natural languages. We present an unsupervised approach to learning human-readable descriptions of phonological processes from collections of related utterances. Our approach builds upon a technique from the programming languages community called *constraint-based program synthesis*. We contribute a novel encoding of the learning problem into Boolean Satisfiability constraints, which enables both data efficiency and fast inference. We evaluate our system on textbook phonology problems and datasets from the literature, and show that it achieves high accuracy at interactive speeds.