An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems

Hany Ahmed; Mohamed Elaraby; Abdullah M. Mousa; Mostafa Elhosiny; Sherif Abdou; Mohsen Rashwan

doi:10.18653/v1/W17-1310

An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems

Hany Ahmed, Mohamed Elaraby, Abdullah M. Mousa, Mostafa Elhosiny, Sherif Abdou, Mohsen Rashwan

Abstract

In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network(SOM).The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to different speakers. In our experiments, we compared our technique with Normalized Cross Likelihood ratio Clustering (NCLR). Results show that the proposed technique reduces the speaker error rate in comparison with NCLR. Finally, we have experimented the effect of speaker clustering on Speaker Adaptive Training (SAT) in a speech recognition system implemented to test the performance of the proposed technique. It was noted that the proposed technique reduced the WER over clustering speakers with NCLR.

Anthology ID:: W17-1310
Volume:: Proceedings of the Third Arabic Natural Language Processing Workshop
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:: WANLP
SIG:: SEMITIC
Publisher:: Association for Computational Linguistics
Note:
Pages:: 79–83
Language:
URL:: https://aclanthology.org/W17-1310
DOI:: 10.18653/v1/W17-1310
Bibkey:
Cite (ACL):: Hany Ahmed, Mohamed Elaraby, Abdullah M. Mousa, Mostafa Elhosiny, Sherif Abdou, and Mohsen Rashwan. 2017. An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 79–83, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems (Ahmed et al., WANLP 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-2023-videos/W17-1310.pdf

PDF Search