Mohamed Nabih Ali Mohamed Nawar
2026
Phonetic-based Ranking for Improved Pseudo-Labeling in Low-Resource ASR
Marco Matassoni | Roberto Gretter | Falavigna Daniele | Mohamed Nabih Ali Mohamed Nawar | Alessio Brutti | Matteo Negri | Mauro Cettolo | Marco Gaido | Sara Papi | Luisa Bentivogli
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Marco Matassoni | Roberto Gretter | Falavigna Daniele | Mohamed Nabih Ali Mohamed Nawar | Alessio Brutti | Matteo Negri | Mauro Cettolo | Marco Gaido | Sara Papi | Luisa Bentivogli
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The rise of large language models has boosted speech and language technologies; however, where transcripts of audio data are limited, the performance of current technology is not yet satisfactory. One common strategy to tackle data scarcity is leveraging pseudo-labels, for example automatically transcribing data with a pre-trained ASR. One critical issue of this approach is assessing the quality of the automatic transcriptions, that may be rather bad for low-resourced languages. While several filtering approaches exist in literature, they typically work with decent pre-trained ASR models but may fail otherwise. In this work we propose a phonetic-based ranking, enabling an effective selection with controllable computational resources; the resulting subset of pseudo-labels serves as additional material for fine-tuning the source ASR models. Experiments on common benchmarks in three low-resource languages demonstrate the effectiveness of the proposed approach, yielding up to a 3-point reduction in WER.
2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi | Marco Gaido | Luisa Bentivogli | Alessio Brutti | Mauro Cettolo | Roberto Gretter | Marco Matassoni | Mohamed Nabih Ali Mohamed Nawar | Matteo Negri
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Sara Papi | Marco Gaido | Luisa Bentivogli | Alessio Brutti | Mauro Cettolo | Roberto Gretter | Marco Matassoni | Mohamed Nabih Ali Mohamed Nawar | Matteo Negri
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)