Mathew Magimai-Doss

Also published as: Mathew Magimai Doss

2025

pdf bib abs
kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech
Karl El Hajal | Ajinkya Kulkarni | Enno Hermann | Mathew Magimai Doss
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

While recent zero-shot multi-speaker text-to-speech (TTS) models achieve impressive results, they typically rely on extensive transcribed speech datasets from numerous speakers and intricate training pipelines. Meanwhile, self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS. Further, SSL features from different speakers that are linearly close share phonetic information while maintaining individual speaker identity. In this study, we introduce kNN-TTS, a simple and effective framework for zero-shot multi-speaker TTS using retrieval methods which leverage the linear relationships between SSL features. Objective and subjective evaluations show that our models, trained on transcribed speech from a single speaker only, achieve performance comparable to state-of-the-art models that are trained on significantly larger training datasets. The low training data requirements mean that kNN-TTS is well suited for the development of multi-speaker TTS systems for low-resource domains and languages. We also introduce an interpolation parameter which enables fine-grained voice morphing. Demo samples are available at https://idiap.github.io/knn-tts .

2020

pdf bib abs
An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition
Sandrine Tornay | Oya Aran | Mathew Magimai Doss
Proceedings of the Twelfth Language Resources and Evaluation Conference

HMMs have been the one of the first models to be applied for sign recognition and have become the baseline models due to their success in modeling sequential and multivariate data. Despite the extensive use of HMMs for sign recognition, determining the HMM structure has still remained as a challenge, especially when the number of signs to be modeled is high. In this work, we present a continuous HMM framework for modeling and recognizing isolated signs, which inherently performs model selection to optimize the number of states for each sign separately during recognition. Our experiments on three different datasets, namely, German sign language DGS dataset, Turkish sign language HospiSign dataset and Chalearn14 dataset show that the proposed approach achieves better sign language or gesture recognition systems in comparison to the approach of selecting or presetting the number of HMM states based on k-means, and yields systems that perform competitive to the case where the number of states are determined based on the test set performance.