Abstract
India is a land of language diversity. There are approximately 2000 languages spoken around, and among which officially registered are 23. In those, there are very few with Automatic Speech Recognition (ASR) capability. The reason for this is the fact that building an ASR system requires thousands of hours of annotated speech data, a vast amount of text, and a lexicon that can span all the words in the language. At the same time, it is observed that Indian languages share a common phonetic base. In this work, we build a multilingual speech recognition system for low-resource languages by leveraging the shared phonetic space. Deep Neural architectures play a vital role in improving the performance of low-resource ASR systems. The typical strategy used to train the multilingual acoustic model is merging various languages as a unified group. In this paper, the speech recognition system is built using six Indian languages, namely Gujarati, Hindi, Marathi, Odia, Tamil, and Telugu. Various state-of-the-art experiments were performed using different acoustic modeling and language modeling techniques.