Advait Joglekar

2025

pdf bib abs
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Advait Joglekar | Divyanshu Singh | Rooshil Rohit Bhatia | Srinivasan Umesh
Findings of the Association for Computational Linguistics: EMNLP 2025

Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual settings. They are also often unable to generalize for speakers of unseen languages and accents. In this paper, we adopt a simple yet effective approach that combines discrete speech representations from self-supervised models with a non-autoregressive Diffusion-Transformer based conditional flow matching speech decoder. We show that this architecture allows us to train a voice-conversion model in a purely textless, self-supervised fashion. Our technique works without requiring multiple encoders to disentangle speech features. Our model also manages to excel in zero-shot cross-lingual settings even for unseen languages. We provide our code, model checkpoint and demo samples here: https://github.com/ez-vc/ez-vc

pdf bib abs
Effectively combining Phi-4 and NLLB for Spoken Language Translation: SPRING Lab IITM’s submission to Low Resource Multilingual Indic Track
Sankalpa Sarkar | Samriddhi Kashyap | Advait Joglekar | Srinivasan Umesh
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

This paper presents the methodologies implemented for Spoken Language Translation for the language pairs Hindi-English, Bengali-English and Tamil-English for the Low Resource Multilingual Indic Track of The International Conference on Spoken Language Translation (IWSLT) for 2025. We adopt a cascaded approach and use a fine-tuned Phi-4 multimodal instruct model for Automatic Speech Recognition(ASR) and a fine-tuned NLLB model for Machine Translation(MT).

2024

pdf bib abs
SPRING Lab IITM’s Submission to Low Resource Indic Language Translation Shared Task
Hamees Sayed | Advait Joglekar | Srinivasan Umesh
Proceedings of the Ninth Conference on Machine Translation

We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation, leveraging data from WMT task datasets, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi, significantly expanding our training corpus. We fine-tune the pre-trained NLLB 3.3B model for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, which is not supported by the NLLB model, we introduce special tokens and train the model on our Khasi corpus. Our training involves masked language modelling, followed by fine-tuning for English-to-Indic and Indic-to-English translations.

Co-authors

Divyanshu Singh 1

Venues

Fix author