Krishnan Ramakrishnan
2019
Non-native Accent Partitioning for Speakers of Indian Regional Languages
Radha Krishna Guntur
|
Krishnan Ramakrishnan
|
Vinay Kumar Mittal
Proceedings of the 16th International Conference on Natural Language Processing
Acoustic features extracted from the speech signal can help in identifying speaker related multiple information such as geographical origin, regional accent and nativity. In this paper, classification of native speakers of South Indian languages is carried out based upon the accent of their non-native language, i.e., English. Four South Indian languages: Kannada, Malayalam, Tamil, and Telugu are examined. A database of English speech from the native speakers of these languages, along with the native language speech data was collected, from a non-overlapping set of speakers. Segment level acoustic features F0 and Mel-frequency cepstral coefficients (MFCCs) are used. Accent partitioning of non-native English speech data is carried out using multiple classifiers: k-nearest neighbour (KNN), linear discriminant analysis (LDA) and support vector machine (SVM), for validation and comparison of results. Classification accuracies of 86.6% are observed using KNN, and 89.2% or more than 90% using SVM classifier. A study of acoustic feature F0 contour, related to L2 intonation, showed that native speakers of Kannada language are quite distinct as compared to those of Tamil or Telugu languages. It is also observed that identification of Malayalam and Kannada speakers from their English speech accent is relatively easier than Telugu or Tamil speakers.