Cross-Lingual Speaker Identification for Indian Languages
Amaan Rizvi, Anupam Jamatia, Dwijen Rudrapal, Kunal Chakma, Björn Gambäck
Abstract
The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding mel-spectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.- Anthology ID:
- 2023.ranlp-1.105
- Volume:
- Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 979–987
- Language:
- URL:
- https://aclanthology.org/2023.ranlp-1.105
- DOI:
- Cite (ACL):
- Amaan Rizvi, Anupam Jamatia, Dwijen Rudrapal, Kunal Chakma, and Björn Gambäck. 2023. Cross-Lingual Speaker Identification for Indian Languages. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 979–987, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Cross-Lingual Speaker Identification for Indian Languages (Rizvi et al., RANLP 2023)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2023.ranlp-1.105.pdf