On Construction of the ASR-oriented Indian English Pronunciation Dictionary

Xian Huang; Xin Jin; Qike Li; Keliang Zhang

On Construction of the ASR-oriented Indian English Pronunciation Dictionary

Xian Huang, Xin Jin, Qike Li, Keliang Zhang

Abstract

As a World English, a New English and a regional variety of English, Indian English (IE) has developed its own distinctive characteristics, especially phonologically, from other varieties of English. An Automatic Speech Recognition (ASR) system simply trained on British English (BE) /American English (AE) speech data and using the BE/AE pronunciation dictionary performs much worse when applied to IE. An applicable IEASR system needs spontaneous IE speech as training materials and a comprehensive, linguistically-guided IE pronunciation dictionary (IEPD) so as to achieve the effective mapping between the acoustic model and language model. This research builds a small IE spontaneous speech corpus, analyzes and summarizes the phonological variation features of IE, comes up with an IE phoneme set and complies the IEPD (including a common-English-word list, an Indian-word list, an acronym list and an affix list). Finally, two ASR systems are trained with 120 hours IE spontaneous speech data, using the IEPD we construct in this study and CMUdict separately. The two systems are tested with 50 audio clips of IE spontaneous speech. The result shows the system trained with IEPD performs better than the one trained with CMUdict with WER being 15.63% lower on the test data.

Anthology ID:: 2020.lrec-1.812
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 6593–6598
Language:: English
URL:: https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.812/
DOI:
Bibkey:
Cite (ACL):: Xian Huang, Xin Jin, Qike Li, and Keliang Zhang. 2020. On Construction of the ASR-oriented Indian English Pronunciation Dictionary. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6593–6598, Marseille, France. European Language Resources Association.
Cite (Informal):: On Construction of the ASR-oriented Indian English Pronunciation Dictionary (Huang et al., LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.812.pdf

PDF Cite Search Fix data