Abstract
Significant research has focused on speaker recognition, determining which speaker is speaking in a segment of audio. However, few experiments have investigated speaker recognition for very low-resource or endangered languages. Furthermore, speaker recognition has the potential to support language documentation and revitalization efforts, making recordings more accessible to researchers and communities. Since endangered language datasets are too small to build competitive speaker representations from scratch, we investigate the application of large-scale pre-built speaker recognition models to bridge this gap. This paper compares four speaker recognition models on six diverse endangered language data sets. Comparisons contrast three recent neural network-based x-vector models and an earlier baseline i-vector model. Experiments demonstrate significantly stronger performance for some of the studied models. Further analysis highlights differences in effectiveness tied to the lengths of test audio segments and amount of data used for speaker modeling.- Anthology ID:
- 2024.sigul-1.4
- Volume:
- Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Maite Melero, Sakriani Sakti, Claudia Soria
- Venues:
- SIGUL | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 27–32
- Language:
- URL:
- https://aclanthology.org/2024.sigul-1.4
- DOI:
- Cite (ACL):
- Gina-Anne Levow. 2024. Assessing Pre-Built Speaker Recognition Models for Endangered Language Data. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 27–32, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Assessing Pre-Built Speaker Recognition Models for Endangered Language Data (Levow, SIGUL-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.sigul-1.4.pdf