Assessing Pre-Built Speaker Recognition Models for Endangered Language Data

Gina-Anne Levow


Abstract
Significant research has focused on speaker recognition, determining which speaker is speaking in a segment of audio. However, few experiments have investigated speaker recognition for very low-resource or endangered languages. Furthermore, speaker recognition has the potential to support language documentation and revitalization efforts, making recordings more accessible to researchers and communities. Since endangered language datasets are too small to build competitive speaker representations from scratch, we investigate the application of large-scale pre-built speaker recognition models to bridge this gap. This paper compares four speaker recognition models on six diverse endangered language data sets. Comparisons contrast three recent neural network-based x-vector models and an earlier baseline i-vector model. Experiments demonstrate significantly stronger performance for some of the studied models. Further analysis highlights differences in effectiveness tied to the lengths of test audio segments and amount of data used for speaker modeling.
Anthology ID:
2024.sigul-1.4
Volume:
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Maite Melero, Sakriani Sakti, Claudia Soria
Venues:
SIGUL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
27–32
Language:
URL:
https://aclanthology.org/2024.sigul-1.4
DOI:
Bibkey:
Cite (ACL):
Gina-Anne Levow. 2024. Assessing Pre-Built Speaker Recognition Models for Endangered Language Data. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 27–32, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Assessing Pre-Built Speaker Recognition Models for Endangered Language Data (Levow, SIGUL-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.sigul-1.4.pdf