Adele Goldberg

Also published as: Adele E. Goldberg


2025

pdf bib
Linguistic Generalizations are not Rules: Impacts on Evaluation of LMs
Leonie Weissweiler | Kyle Mahowald | Adele E. Goldberg
Proceedings of the Second International Workshop on Construction Grammars and NLP

Linguistic evaluations of how well LMs generalize to produce or understand novel text often implicitly take for granted that natural languages are generated by symbolic rules. Grammaticality is thought to be determined by whether sentences obey such rules. Interpretation is believed to be compositionally generated by syntactic rules operating on meaningful words. Semantic parsing is intended to map sentences into formal logic. Failures of LMs to obey strict rules have been taken to reveal that LMs do not produce or understand language like humans. Here we suggest that LMs’ failures to obey symbolic rules may be a feature rather than a bug, because natural languages are not based on rules. New utterances are produced and understood by a combination of flexible, interrelated, and context-dependent constructions. We encourage researchers to reimagine appropriate benchmarks and analyses that acknowledge the rich, flexible generalizations that comprise natural languages.

pdf bib
Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs
Supantho Rakshit | Adele E. Goldberg
Proceedings of the Second International Workshop on Construction Grammars and NLP

The usage-based constructionist (UCx) approach to language posits that language comprises a network of learned form-meaning pairings (constructions) whose use is largely determined by their meanings or functions, requiring them to be graded and probabilistic. This study investigates whether the internal representations in Large Language Models (LLMs) reflect the proposed function-infused gradience. We analyze representations of the English Double Object (DO) and Prepositional Object (PO) constructions in Pythia-1.4B, using a dataset of 5000 sentence pairs systematically varied by human-rated preference strength for DO or PO. Geometric analyses show that the separability between the two constructions’ representations, as measured by energy distance or Jensen-Shannon divergence, is systematically modulated by gradient preference strength, which depends on lexical and functional properties of sentences. That is, more prototypical exemplars of each construction occupy more distinct regions in activation space, compared to sentences that could have equally well have occured in either construction. These results provide evidence that LLMs learn rich, meaning-infused, graded representations of constructions and offer support for geometric measures for representations in LLMs.

2023

pdf bib
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi | James McClelland | Adele Goldberg | Robert Hawkins
Findings of the Association for Computational Linguistics: ACL 2023

Accounts of human language processing have long appealed to implicit “situation models” that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched “syntactic” control where the situation model is not strictly necessary. These analyses suggest a distinct pathway through which implicit situation models may be constructed to guide pronoun resolution

2020

pdf bib
Investigating representations of verb bias in neural language models
Robert Hawkins | Takateru Yamakoshi | Thomas Griffiths | Adele Goldberg
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Languages typically provide more than one grammatical construction to express certain types of messages. A speaker’s choice of construction is known to depend on multiple factors, including the choice of main verb – a phenomenon known as verb bias. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences. Results show that larger models perform better than smaller models, and transformer architectures (e.g. GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under comparable parameter and training settings. Additional analyses of internal feature representations suggest that transformers may better integrate specific lexical information with grammatical constructions.

2019

pdf bib
Modeling the Acquisition of Words with Multiple Meanings
Libby Barak | Sammy Floyd | Adele Goldberg
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

bib
Polysemous Language in Child Directed Speech
Sammy Floyd | Libby Barak | Adele Goldberg | Casey Lew-Williams
Proceedings of the 2019 Workshop on Widening NLP

Polysemous Language in Child Directed Speech Learning the meaning of words is one of the fundamental building blocks of verbal communication. Models of child language acquisition have generally made the simplifying assumption that each word appears in child-directed speech with a single meaning. To understand naturalistic word learning during childhood, it is essential to know whether children hear input that is in fact constrained to single meaning per word, or whether the environment naturally contains multiple senses. In this study, we use a topic modeling approach to automatically induce word senses from child-directed speech. Our results confirm the plausibility of our automated analysis approach and reveal an increasing rate of using multiple senses in child-directed speech, starting with corpora from children as early as the first year of life.

bib
Evaluating Ways of Adapting Word Similarity
Libby Barak | Adele Goldberg
Proceedings of the 2019 Workshop on Widening NLP

People judge pairwise similarity by deciding which aspects of the words’ meanings are relevant for the comparison of the given pair. However, computational representations of meaning rely on dimensions of the vector representation for similarity comparisons, without considering the specific pairing at hand. Prior work has adapted computational similarity judgments by using the softmax function in order to address this limitation by capturing asymmetry in human judgments. We extend this analysis by showing that a simple modification of cosine similarity offers a better correlation with human judgments over a comprehensive dataset. The modification performs best when the similarity between two words is calculated with reference to other words that are most similar and dissimilar to the pair.

bib
Context Effects on Human Judgments of Similarity
Libby Barak | Noe Kong-Johnson | Adele Goldberg
Proceedings of the 2019 Workshop on Widening NLP

The semantic similarity of words forms the basis of many natural language processing methods. These computational similarity measures are often based on a mathematical comparison of vector representations of word meanings, while human judgments of similarity differ in lacking geometrical properties, e.g., symmetric similarity and triangular similarity. In this study, we propose a novel task design to further explore human behavior by asking whether a pair of words is deemed more similar depending on an immediately preceding judgment. Results from a crowdsourcing experiment show that people consistently judge words as more similar when primed by a judgment that evokes a relevant relationship. Our analysis further shows that word2vec similarity correlated significantly better with the out-of-context judgments, thus confirming the methodological differences in human-computer judgments, and offering a new testbed for probing the differences.

2016

pdf bib
Comparing Computational Cognitive Models of Generalization in a Language Acquisition Task
Libby Barak | Adele E. Goldberg | Suzanne Stevenson
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Leveraging Preposition Ambiguity to Assess Compositional Distributional Models of Semantics
Samuel Ritter | Cotie Long | Denis Paperno | Marco Baroni | Matthew Botvinick | Adele Goldberg
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics