2025
pdf
bib
abs
Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
Ethan Wilcox
|
Cui Ding
|
Giovanni Acampa
|
Tiago Pimentel
|
Alex Warstadt
|
Tamar I Regev
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper argues that the relationship between lexical identity and prosody—one well-studied parameter of linguistic variation—can be characterized using information theory. We predict that languages that use prosody to make lexical distinctions should exhibit a higher mutual information between word identity and prosody, compared to languages that don’t. We test this hypothesis in the domain of pitch, which is used to make lexical distinctions in tonal languages, like Cantonese. We use a dataset of speakers reading sentences aloud in ten languages across five language families to estimate the mutual information between the text and their pitch curves. We find that, across languages, pitch curves display similar amounts of entropy. However, these curves are easier to predict given their associated text in the tonal languages, compared to pitch- and stress-accent languages, and thus the mutual information is higher in these languages, supporting our hypothesis. Our results support perspectives that view linguistic typology as gradient, rather than categorical.
pdf
bib
abs
The time scale of redundancy between prosody and linguistic context
Tamar I Regev
|
Chiebuka Ohams
|
Shaylee Xie
|
Lukas Wolf
|
Evelina Fedorenko
|
Alex Warstadt
|
Ethan Wilcox
|
Tiago Pimentel
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In spoken communication, information is transmitted not only via words, but also through a rich array of non-verbal signals, including prosody—the non-segmental auditory features of speech. Do these different communication channels carry distinct information? Prior work has shown that the information carried by prosodic features is substantially redundant with that carried by the surrounding words. Here, we systematically examine the time scale of this relationship, studying how it varies with the length of past and future contexts. We find that a word’s prosodic features require an extended past context (3-8 words across different features) to be reliably predicted. Given that long-scale contextual information decays in memory, prosody may facilitate communication by adding information that is locally unique. We also find that a word’s prosodic features show some redundancy with future words, but only with a short scale of 1-2 words, consistent with reports of incremental short-term planning in language production. Thus, prosody may facilitate communication by helping listeners predict upcoming material. In tandem, our results highlight potentially distinct roles that prosody plays in facilitating integration of words into past contexts and in helping predict upcoming words.
2023
pdf
bib
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Lukas Wolf
|
Klemen Kotar
|
Greta Tuckute
|
Eghbal Hosseini
|
Tamar I. Regev
|
Ethan Gotlieb Wilcox
|
Alexander Scott Warstadt
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning