A surprisal–duration trade-off across and within the world’s languages

Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, Ryan Cotterell


Abstract
While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal–duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.
Anthology ID:
2021.emnlp-main.73
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
949–962
Language:
URL:
https://aclanthology.org/2021.emnlp-main.73
DOI:
10.18653/v1/2021.emnlp-main.73
Bibkey:
Cite (ACL):
Tiago Pimentel, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi, and Ryan Cotterell. 2021. A surprisal–duration trade-off across and within the world’s languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 949–962, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A surprisal–duration trade-off across and within the world’s languages (Pimentel et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2021.emnlp-main.73.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2021.emnlp-main.73.mp4
Code
 rycolab/surprisal-duration-tradeoff
Data
VoxClamantis