Whitney Poh

2026

Euphemisms are words or phrases used to soften or indirectly refer to taboo or sensitive topics. They pose interpretation challenges because the same expression may appear in different senses depending on context: literal, figurative but non-euphemistic, or euphemistic. For example, pull the plug may refer euphemistically to ending a patient’s life support, figuratively to canceling a project or funding, or literally to unplugging a device. Euphemisms also vary across languages and cultures in both their surface forms and the contexts in which they are conventionally used. Previous work introduced datasets for the computational study of euphemisms in five languages. We extend this line of work by introducing two new annotated datasets for euphemism detection in Polish and Ukrainian and by standardizing resources for all seven languages into a unified benchmark format that supports cross-lingual evaluation. Finally, we provide zero-shot and few-shot baselines using GPT-5-nano. We ran each configuration five times and report the average score, establishing reference scores for multilingual pragmatic understanding. We also performed pilot tests using Qwen3-4B on the English and Chinese datasets.

2025

pdf bib abs

What did you say? Generating Child-Directed Speech Questions to Train LLMs
Whitney Poh | Michael Tombolini | Libby Barak
Proceedings of the First BabyLM Workshop

Child-Directed Speech (CDS) holds unique linguistic properties that distinguish it from other types of textual corpora. Language models trained using CDS often obtain superior results compared with the same size of different types of data. Several studies have aimed at modifying non-CDS data to mimic its linguistic properties to match the hypothesized advantageous aspects of CDS. Here, we propose to adapt the non-CDS portions of the training data to include questions similar to CDS interaction. We modify the data by adding artificially generated questions to the data and methodically analyzing the change in performance using each modified dataset. Our results show that artificial question generation strongly depends on the properties of the original dataset. While the performance improves for question-related measures, the overall performance is negatively affected as a result of the reduced syntactic diversity.

Co-authors

Jing Peng 1

Julia Sammartino 1

Michael Tombolini 1

Natalia Zawadzka-Paluektau 1

Venues

BabyLM1
LREC1

Fix author