Meysam Shamsi

2026

Central Kurdish Text-to-Speech and Its Application in Speech-to-Text Translation
Mohammad Mohammadamini | Meysam Shamsi | Marie Tahon
Proceedings of the Fifteenth Language Resources and Evaluation Conference

In this study, we show how from available resources develop high-quality TTS models for low-resource scenarios that according to our extensive evaluation surpass the models trained on dedicated TTS data recorded in the studio. We develop three Text-to-Speech (TTS) models for Central Kurdish as a low-resource language using F5-TTS architecture. The models are trained on Central Kurdish TTS datasets in which two of them are curated from audiobooks during this study and the third one is evaluated for the first time. We also demonstrate the potential of TTS models for developing other speech technologies in low-resource languages by proposing a speech synthesis framework used in a speech-to-text translation application, achieving promising results on standard speech translation benchmarks. The curated TTS resources and models will be publicly available under CC BY-NC-ND 4.0 license

2020

pdf bib abs

TTS voice corpus reduction for audio-book generation
Meysam Shamsi
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL

Nowadays, with emerging new voice corpora, voice corpus reduction in expressive TTS becomes more important. In this study a spitting greedy approach is investigated to remove utterances. In the first step by comparing five objective measures, the TTS global cost has been found as the best available metric for approximation of perceptual quality. The greedy algorithm employs this measure to evaluate the candidates in each step and the synthetic quality resulted by its solution. It turned out that reducing voice corpus size until a certain length (1 hour in our experiment) could not degrade the synthetic quality. By modifying the original greedy algorithm, its computation time is reduced to a reasonable duration. Two perceptual tests have been run to compare this greedy method and the random strategy for voice corpus reduction. They revealed that there is no superiority of using the proposed greedy approach for corpus reduction.

Co-authors

Venues

Fix author