Mohammad Mohammadamini

2026

Southern Kurdish Speech Recognition Resources and Benchmarking
Mohammad Mohammadamini | Marie Tahon
Proceedings of the Fifteenth Language Resources and Evaluation Conference

This article introduces a dedicated speech recognition dataset for Southern Kurdish, which is a threatened variant of Kurdish macrolanguage. We present 30 hours of validated read speech for training and an evaluation benchmark for Southern Kurdish Automatic Speech Recognition (ASR). Both the training data and evaluation benchmark are read speech recorded by crowdsourcing campaigns. Besides a detailed description of the provided resources, we provide the ASR baselines using Whisper-turbo and wav2vec-bert CTC architectures. We achieved a 4.09 CER and 24.26 WER on our benchmark using wav2vec-bert model. We also provide a categorization of errors to support further improvements in future studies.The resources and trained models are released under the CC BY-NC-ND 4.0 license and are publicly available at https://huggingface.co/datasets/aranemini/southern-kurdish-asr

pdf bib abs

English to Central Kurdish Speech Translation: Corpus Creation, Evaluation, and Orthographic Standardization
Mohammad Mohammadamini | Daban Jaff | Josep Crego | Marie Tahon | Antoine LAURENT
Proceedings of the Fifteenth Language Resources and Evaluation Conference

We present KUTED, a speech-to-text translation (S2TT) dataset for Central Kurdish, derived from TED and TEDx talks. The corpus comprises 91,000 sentence pairs, including 170 hours of English audio, 1.65 million English tokens, and 1.40 million Central Kurdish tokens. We evaluate KUTED on the S2TT task and find that orthographic variation significantly degrades Kurdish translation performance, producing nonstandard outputs. To address this, we propose a systematic text standardization approach that yields substantial performance gains and more consistent translations. On a test set separated from TED talks, a fine-tuned Seamless model achieves 15.18 BLEU, and we improve Seamless baseline by 3.0 BLEU on the FLEURS benchmark. We also train a Transformer model from scratch and evaluate a cascaded system that combines Seamless (ASR) with NLLB (MT).

pdf bib abs

Central Kurdish Text-to-Speech and Its Application in Speech-to-Text Translation
Mohammad Mohammadamini | Meysam Shamsi | Marie Tahon
Proceedings of the Fifteenth Language Resources and Evaluation Conference

In this study, we show how from available resources develop high-quality TTS models for low-resource scenarios that according to our extensive evaluation surpass the models trained on dedicated TTS data recorded in the studio. We develop three Text-to-Speech (TTS) models for Central Kurdish as a low-resource language using F5-TTS architecture. The models are trained on Central Kurdish TTS datasets in which two of them are curated from audiobooks during this study and the third one is evaluated for the first time. We also demonstrate the potential of TTS models for developing other speech technologies in low-resource languages by proposing a speech synthesis framework used in a speech-to-text translation application, achieving promising results on standard speech translation benchmarks. The curated TTS resources and models will be publicly available under CC BY-NC-ND 4.0 license

2025

pdf bib abs

In this paper, we introduce the Kuvost, a large-scale English to Central Kurdish speech-to-text-translation (S2TT) dataset. This dataset includes 786k utterances derived from Common Voice 18, translated and revised by 230 volunteers into Central Kurdish. Encompassing 1,003 hours of translated speech, this dataset can play a groundbreaking role for Central Kurdish, which severely lacks public-domain resources for speech translation. Following the dataset division in Common Voice, there are 298k, 6,226, and 7,253 samples in the train, development, and test sets, respectively. The dataset is evaluated on end-to-end English-to-Kurdish S2TT using Whisper V3 Large and SeamlessM4T V2 Large models. The dataset is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License https://huggingface.co/datasets/aranemini/kuvost.

2024

pdf bib abs

RoboVox: A Single/Multi-channel Far-field Speaker Recognition Benchmark for a Mobile Robot
Mohammad Mohammadamini | Driss Matrouf | Michael Rouvier | Jean-Francois Bonastre | Romain Serizel | Theophile Gonos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we introduce a new far-field speaker recognition benchmark called RoboVox. RoboVox is a French corpus recorded by a mobile robot. The files are recorded from different distances under severe acoustical conditions with the presence of several types of noise and reverberation. In addition to noise and reverberation, the robot’s internal noise acts as an extra additive noise. RoboVox can be used for both single-channel and multi-channel speaker recognition. In the evaluation protocols, we are considering both cases. The obtained results demonstrate a significant decline in performance in far-filed speaker recognition and urge the community to further research in this domain

2022

pdf bib abs

Far-Field Speaker Recognition Benchmark Derived From The DiPCo Corpus
Mickael Rouvier | Mohammad Mohammadamini
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present a far-field speaker verification benchmark derived from the publicly-available DiPCo corpus. This corpus comprise three different tasks that involve enrollment and test conditions with single- and/or multi-channels recordings. The main goal of this corpus is to foster research in far-field and multi-channel text-independent speaker verification. Also, it can be used for other speaker recognition tasks such as dereverberation, denoising and speech enhancement. In addition, we release a Kaldi and SpeechBrain system to facilitate further research. And we validate the evaluation design with a single-microphone state-of-the-art speaker recognition system (i.e. ResNet-101). The results show that the proposed tasks are very challenging. And we hope these resources will inspire the speech community to develop new methods and systems for this challenging domain.

Co-authors

Venues

Fix author