Barend Beekhuizen


2025

pdf bib
Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs
Julia Watson | Sophia S. Lee | Barend Beekhuizen | Suzanne Stevenson
Proceedings of the 31st International Conference on Computational Linguistics

We study language ideologies in text produced by LLMs through a case study on English gendered language reform (related to role nouns like congressperson/-woman/-man, and singular they). First, we find political bias: when asked to use language that is “correct” or “natural”, LLMs use language most similarly to when asked to align with conservative (vs. progressive) values. This shows how LLMs’ metalinguistic preferences can implicitly communicate the language ideologies of a particular political group, even in seemingly non-political contexts. Second, we find LLMs exhibit internal inconsistency: LLMs use gender-neutral variants more often when more explicit metalinguistic context is provided. This shows how the language ideologies expressed in text produced by LLMs can vary, which may be unexpected to users. We discuss the broader implications of these findings for value alignment.

pdf bib
Spatial relation marking across languages: extraction, evaluation, analysis
Barend Beekhuizen
Proceedings of the 29th Conference on Computational Natural Language Learning

This paper presents a novel task, detecting Spatial Relation Markers (SRMs, like English _**in** the bag_), across languages, alongside a model for this task, RUIMTE. Using a massively parallel corpus of Bible translations, the model is evaluated against existing and baseline models on the basis of a novel evaluation set. The model presents high quality SRM extraction, and an accurate identification of situations where language have zero-marked SRMs.

pdf bib
Vorm: Translations and a constrained hypothesis space support unsupervised morphological segmentation across languages
Barend Beekhuizen
Proceedings of the 29th Conference on Computational Natural Language Learning

This paper introduces Vorm, an unsupervised morphological segmentation system, leveraging translation data to infer highly accurate morphological transformations, including less-frequently modeled processes such as infixation and reduplication. The system is evaluated on standard benchmark data and a novel, typologically diverse, dataset of 37 languages. Model performance is competitive and sometimes superior on canonical segmentation, but more limited on surface segmentation.

pdf bib
A discovery procedure for synlexification patterns in the world’s languages
Hannah S. Rognan | Barend Beekhuizen
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Synlexification is the pattern of crosslinguistic lexical semantic variation whereby what is expressed in a single word in one language, is expressed in multiple words in another (e.g., French ‘monter’ vs. English ‘go+up’). We introduce a computational method for automatically extracting instances of synlexification from a parallel corpus at a large scale (many languages, many domains). The method involves debiasing the seed language by splitting up synlexifications in the seed language where other languages consistently split them. The method was applied to a massively parallel corpus of 198 Bible translations. We validate it on a broad sample of cases, and demonstrate its potential for typological research.

pdf bib
Token-level semantic typology without a massively parallel corpus
Barend Beekhuizen
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This paper presents a computational method for token-level lexical semantic comparative research in an original text setting, as opposed to the more common massively parallel setting. Given a set of (non-massively parallel) bitexts, the method consists of leveraging pre-trained contextual vectors in a reference language to induce, for a token in one target language, the lexical items that all other target languages would have used, thus simulating a massively parallel set-up. The method is evaluated on its extraction and induction quality, and the use of the method for lexical semantic typological research is demonstrated.

2024

pdf bib
Using a Language Model to Unravel Semantic Development in Children’s Use of a Dutch Perception Verb
Bram van Dijk | Max J. van Duijn | Li Kloostra | Marco Spruit | Barend Beekhuizen
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

In this short paper we employ a Language Model (LM) to gain insight into how complex semantics of a Perception Verb (PV) emerge in children. Using a Dutch LM as representation of mature language use, we find that for all ages 1) the LM accurately predicts PV use in children’s freely-told narratives; 2) children’s PV use is close to mature use; 3) complex PV meanings with attentional and cognitive aspects can be found. Our approach illustrates how LMs can be meaningfully employed in studying language development, hence takes a constructive position in the debate on the relevance of LMs in this context.

2023

pdf bib
What social attitudes about gender does BERT encode? Leveraging insights from psycholinguistics
Julia Watson | Barend Beekhuizen | Suzanne Stevenson
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Much research has sought to evaluate the degree to which large language models reflect social biases. We complement such work with an approach to elucidating the connections between language model predictions and people’s social attitudes. We show how word preferences in a large language model reflect social attitudes about gender, using two datasets from human experiments that found differences in gendered or gender neutral word choices by participants with differing views on gender (progressive, moderate, or conservative). We find that the language model BERT takes into account factors that shape human lexical choice of such language, but may not weigh those factors in the same way people do. Moreover, we show that BERT’s predictions most resemble responses from participants with moderate to conservative views on gender. Such findings illuminate how a language model: (1) may differ from people in how it deploys words that signal gender, and (2) may prioritize some social attitudes over others.

2022

pdf bib
Remodelling complement coercion interpretation
Frederick Gietz | Barend Beekhuizen
Proceedings of the Society for Computation in Linguistics 2022

2021

pdf bib
A Formidable Ability: Detecting Adjectival Extremeness with DSMs
Farhan Samir | Barend Beekhuizen | Suzanne Stevenson
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2019

pdf bib
Say Anything: Automatic Semantic Infelicity Detection in L2 English Indefinite Pronouns
Ella Rabinovich | Julia Watson | Barend Beekhuizen | Suzanne Stevenson
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Computational research on error detection in second language speakers has mainly addressed clear grammatical anomalies typical to learners at the beginner-to-intermediate level. We focus instead on acquisition of subtle semantic nuances of English indefinite pronouns by non-native speakers at varying levels of proficiency. We first lay out theoretical, linguistically motivated hypotheses, and supporting empirical evidence, on the nature of the challenges posed by indefinite pronouns to English learners. We then suggest and evaluate an automatic approach for detection of atypical usage patterns, demonstrating that deep learning architectures are promising for this task involving nuanced semantic anomalies.

2015

pdf bib
Perceptual, conceptual, and frequency effects on error patterns in English color term acquisition
Barend Beekhuizen | Suzanne Stevenson
Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning

2014

pdf bib
A Usage-Based Model of Early Grammatical Development
Barend Beekhuizen | Rens Bod | Afsaneh Fazly | Suzanne Stevenson | Arie Verhagen
Proceedings of the Fifth Workshop on Cognitive Modeling and Computational Linguistics