Hemant Yadav


A Survey of Multilingual Models for Automatic Speech Recognition
Hemant Yadav | Sunayana Sitaram
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Although Automatic Speech Recognition (ASR) systems have achieved human-like performance for a few languages, the majority of the world’s languages do not have usable systems due to the lack of large speech datasets to train these models. Cross-lingual transfer is an attractive solution to this problem, because low-resource languages can potentially benefit from higher-resource languages either through transfer learning, or being jointly trained in the same multilingual model. The problem of cross-lingual transfer has been well studied in ASR, however, recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models, which can pave the way for improved performance on low-resource languages. In this paper, we survey the state of the art in multilingual ASR models that are built with cross-lingual transfer in mind. We present best practices for building multilingual models from research across diverse languages and techniques, discuss open questions and provide recommendations for future work.


MIDAS at SemEval-2020 Task 10: Emphasis Selection Using Label Distribution Learning and Contextual Embeddings
Sarthak Anand | Pradyumna Gupta | Hemant Yadav | Debanjan Mahata | Rakesh Gosangi | Haimin Zhang | Rajiv Ratn Shah
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our submission to the SemEval 2020 - Task 10 on emphasis selection in written text. We approach this emphasis selection problem as a sequence labeling task where we represent the underlying text with various contextual embedding models. We also employ label distribution learning to account for annotator disagreements. We experiment with the choice of model architectures, trainability of layers, and different contextual embeddings. Our best performing architecture is an ensemble of different models, which achieved an overall matching score of 0.783, placing us 15th out of 31 participating teams. Lastly, we analyze the results in terms of parts of speech tags, sentence lengths, and word ordering.