Phone Inventories and Recognition for Every Language
Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, Shinji Watanabe
Abstract
Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages. However, even the largest collection of phone inventory only covers about 2000 languages, which is only 1/4 of the total number of languages in the world. A majority of the remaining languages are endangered. In this work, we attempt to solve this problem by estimating the phone inventory for any language listed in Glottolog, which contains phylogenetic information regarding 8000 languages. In particular, we propose one probabilistic model and one non-probabilistic model, both using phylogenetic trees (“language family trees”) to measure the distance between languages. We show that our best model outperforms baseline models by 6.5 F1. Furthermore, we demonstrate that, with the proposed inventories, the phone recognition model can be customized for every language in the set, which improved the PER (phone error rate) in phone recognition by 25%.- Anthology ID:
- 2022.lrec-1.114
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 1061–1067
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.114
- DOI:
- Cite (ACL):
- Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, and Shinji Watanabe. 2022. Phone Inventories and Recognition for Every Language. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1061–1067, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Phone Inventories and Recognition for Every Language (Li et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2022.lrec-1.114.pdf