EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion

Advait Joglekar, Divyanshu Singh, Rooshil Rohit Bhatia, Srinivasan Umesh


Abstract
Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual settings. They are also often unable to generalize for speakers of unseen languages and accents. In this paper, we adopt a simple yet effective approach that combines discrete speech representations from self-supervised models with a non-autoregressive Diffusion-Transformer based conditional flow matching speech decoder. We show that this architecture allows us to train a voice-conversion model in a purely textless, self-supervised fashion. Our technique works without requiring multiple encoders to disentangle speech features. Our model also manages to excel in zero-shot cross-lingual settings even for unseen languages. We provide our code, model checkpoint and demo samples here: https://github.com/ez-vc/ez-vc
Anthology ID:
2025.findings-emnlp.1077
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19768–19774
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1077/
DOI:
10.18653/v1/2025.findings-emnlp.1077
Bibkey:
Cite (ACL):
Advait Joglekar, Divyanshu Singh, Rooshil Rohit Bhatia, and Srinivasan Umesh. 2025. EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19768–19774, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion (Joglekar et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1077.pdf
Checklist:
 2025.findings-emnlp.1077.checklist.pdf