Lukas Wolf


2025

pdf bib
The time scale of redundancy between prosody and linguistic context
Tamar I Regev | Chiebuka Ohams | Shaylee Xie | Lukas Wolf | Evelina Fedorenko | Alex Warstadt | Ethan Wilcox | Tiago Pimentel
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In spoken communication, information is transmitted not only via words, but also through a rich array of non-verbal signals, including prosody—the non-segmental auditory features of speech. Do these different communication channels carry distinct information? Prior work has shown that the information carried by prosodic features is substantially redundant with that carried by the surrounding words. Here, we systematically examine the time scale of this relationship, studying how it varies with the length of past and future contexts. We find that a word’s prosodic features require an extended past context (3-8 words across different features) to be reliably predicted. Given that long-scale contextual information decays in memory, prosody may facilitate communication by adding information that is locally unique. We also find that a word’s prosodic features show some redundancy with future words, but only with a short scale of 1-2 words, consistent with reports of incremental short-term planning in language production. Thus, prosody may facilitate communication by helping listeners predict upcoming material. In tandem, our results highlight potentially distinct roles that prosody plays in facilitating integration of words into past contexts and in helping predict upcoming words.

2023

pdf bib
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Lukas Wolf | Klemen Kotar | Greta Tuckute | Eghbal Hosseini | Tamar I. Regev | Ethan Gotlieb Wilcox | Alexander Scott Warstadt
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning

pdf bib
Quantifying the redundancy between prosody and text
Lukas Wolf | Tiago Pimentel | Evelina Fedorenko | Ryan Cotterell | Alex Warstadt | Ethan Wilcox | Tamar Regev
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Prosody—the suprasegmental component of speech, including pitch, loudness, and tempo—carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. We use large language models (LLMs) to estimate how much information is redundant between prosody and the words themselves. Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features, including intensity, duration, pauses, and pitch contours. Furthermore, a word’s prosodic information is redundant with both the word itself and the context preceding as well as following it. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words. Along with this paper, we release a general-purpose data processing pipeline for quantifying the relationship between linguistic information and extra-linguistic features.