EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Advait Joglekar, Divyanshu Singh, Rooshil Rohit Bhatia, Srinivasan Umesh
Abstract
Voice Conversion research in recent times has increasingly focused on improving the zero-shot capabilities of existing methods. Despite remarkable advancements, current architectures still tend to struggle in zero-shot cross-lingual settings. They are also often unable to generalize for speakers of unseen languages and accents. In this paper, we adopt a simple yet effective approach that combines discrete speech representations from self-supervised models with a non-autoregressive Diffusion-Transformer based conditional flow matching speech decoder. We show that this architecture allows us to train a voice-conversion model in a purely textless, self-supervised fashion. Our technique works without requiring multiple encoders to disentangle speech features. Our model also manages to excel in zero-shot cross-lingual settings even for unseen languages. We provide our code, model checkpoint and demo samples here: https://github.com/ez-vc/ez-vc- Anthology ID:
- 2025.findings-emnlp.1077
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19768–19774
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1077/
- DOI:
- 10.18653/v1/2025.findings-emnlp.1077
- Cite (ACL):
- Advait Joglekar, Divyanshu Singh, Rooshil Rohit Bhatia, and Srinivasan Umesh. 2025. EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19768–19774, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion (Joglekar et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1077.pdf