Rethinking Phonotactic Complexity
Abstract
In this work, we propose the use of phone-level language models to estimate phonotactic complexity—measured in bits per phoneme—which makes cross-linguistic comparison straightforward. We compare the entropy across languages using this simple measure, gaining insight on how complex different language’s phonotactics are. Finally, we show a very strong negative correlation between phonotactic complexity and the average length of words—Spearman rho=-0.744—when analysing a collection of 106 languages with 1016 basic concepts each.- Anthology ID:
- W19-3628
- Volume:
- Proceedings of the 2019 Workshop on Widening NLP
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Amittai Axelrod, Diyi Yang, Rossana Cunha, Samira Shaikh, Zeerak Waseem
- Venue:
- WiNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 88–90
- Language:
- URL:
- https://aclanthology.org/W19-3628
- DOI:
- Cite (ACL):
- Tiago Pimentel, Brian Roark, and Ryan Cotterell. 2019. Rethinking Phonotactic Complexity. In Proceedings of the 2019 Workshop on Widening NLP, pages 88–90, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Rethinking Phonotactic Complexity (Pimentel et al., WiNLP 2019)