Anupam Singh


A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
Raviraj Joshi | Anupam Singh
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the form of audio-text pairs. Moreover, these models are more susceptible to domain shift as compared to traditional models. It is common practice to train generic ASR models and then adapt them to target domains using comparatively smaller data sets. We consider a more extreme case of domain adaptation where text-only corpus is available. In this work, we propose a simple baseline technique for domain adaptation in end-to-end speech recognition models. We convert the text-only corpus to audio data using single speaker Text to Speech (TTS) engine. The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models. We show that single speaker synthetic TTS data coupled with final dense layer only fine-tuning provides reasonable improvements in word error rates. We use text data from address and e-commerce search domains to show the effectiveness of our low-cost baseline approach on CTC and attention-based models.

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal | Anupam Singh | Nikesh Garera
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

Automation of on-call customer support relies heavily on accurate and efficient speech-to-intent (S2I) systems. Building such systems using multi-component pipelines can pose various challenges because they require large annotated datasets, have higher latency, and have complex deployment. These pipelines are also prone to compounding errors. To overcome these challenges, we discuss an end-to-end (E2E) S2I model for customer support voicebot task in a bilingual setting. We show how we can solve E2E intent classification by leveraging a pre-trained automatic speech recognition (ASR) model with slight modification and fine-tuning on small annotated datasets. Experimental results show that our best E2E model outperforms a conventional pipeline by a relative ~27% on the F1 score.


Exploring System Combination approaches for Indo-Aryan MT Systems
Karan Singla | Anupam Singh | Nishkarsh Shastri | Megha Jhunjhunwala | Srinivas Bangalore | Dipti Misra Sharma
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants