2025
pdf
bib
abs
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
Lindia Tjuatja
|
Graham Neubig
|
Tal Linzen
|
Sophie Hao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
When comparing the linguistic capabilities of language models (LMs) with humans using LM probabilities, factors such as the length of the sequence and the unigram frequency of lexical items have a significant effect on LM probabilities in ways that humans are largely robust to. Prior works in comparing LM and human acceptability judgments treat these effects uniformly across models, making a strong assumption that models require the same degree of adjustment to control for length and unigram frequency effects. We propose MORCELA, a new linking theory between LM scores and acceptability judgments where the optimal level of adjustment for these effects is estimated from data via learned parameters for length and unigram frequency. We first show that MORCELA outperforms a commonly used linking theory for acceptability—SLOR (Pauls and Klein, 2012; Lau et al., 2017)—across two families of transformer LMs (Pythia and OPT). Furthermore, we demonstrate that the assumed degrees of adjustment in SLOR for length and unigram frequency overcorrect for these confounds, and that larger models require a lower relative degree of adjustment for unigram frequency, though a significant amount of adjustment is still necessary for all models. Finally, our subsequent analysis shows that larger LMs’ lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.
pdf
bib
abs
ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors
Qinchan Li
|
Sophie Hao
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
In languages without orthographic word boundaries, NLP models perform _word segmentation_, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to _morphological garden path errors_—errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, _ERAS_, that tests a model’s vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make morphological garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models’ segmentation of Chinese text often fails to account for morphosyntactic context.
2024
pdf
bib
abs
Universal Generation for Optimality Theory Is PSPACE-Complete
Sophie Hao
Computational Linguistics, Volume 50, Issue 1 - March 2024
This article shows that the universal generation problem for Optimality Theory (OT) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard and at most EXPSPACE-hard, our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general and NL-complete when the number of automata is bounded. Our analysis shows that constraint interaction is the main contributor to the complexity of OT: The ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.
pdf
bib
abs
Reflecting the Male Gaze: Quantifying Female Objectification in 19th and 20th Century Novels
Kexin Luo
|
Yue Mao
|
Bei Zhang
|
Sophie Hao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Inspired by the concept of the male gaze (Mulvey, 1975) in literature and media studies, this paper proposes a framework for analyzing gender bias in terms of female objectification—the extent to which a text portrays female individuals as objects of visual pleasure. Our framework measures female objectification along two axes. First, we compute an agency bias score that indicates whether male entities are more likely to appear in the text as grammatical agents than female entities. Next, by analyzing the word embedding space induced by a text (Caliskan et al., 2017), we compute an appearance bias score that indicates whether female entities are more closely associated with appearance-related words than male entities. Applying our framework to 19th and 20th century novels reveals evidence of female objectification in literature: we find that novels written from a male perspective systematically objectify female characters, while novels written from a female perspective do not exhibit statistically significant objectification of any gender.
2023
pdf
bib
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Yonatan Belinkov
|
Sophie Hao
|
Jaap Jumelet
|
Najoung Kim
|
Arya McCarthy
|
Hosein Mohebbi
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
pdf
bib
abs
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Sophie Hao
|
Tal Linzen
Findings of the Association for Computational Linguistics: EMNLP 2023
Deep architectures such as Transformers are sometimes criticized for having uninterpretable “black-box” representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT’s ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.