Bilal Faye
2026
PUMA: Projected Universal Multilingual ASR for Low-Resource Settings. Application to Diverse African Languages
Ilyes Oukid | Bilal Faye | Hanane Azzag | Mustapha Lebbah | Said Yacine Boulahia
Findings of the Association for Computational Linguistics: ACL 2026
Ilyes Oukid | Bilal Faye | Hanane Azzag | Mustapha Lebbah | Said Yacine Boulahia
Findings of the Association for Computational Linguistics: ACL 2026
Multilingual ASR systems often fail to generalize to low-resource and linguistically diverse languages while remaining costly to scale. We introduce PUMA, a unified multilingual ASR model that improves low-resource performance with reduced model complexity. PUMA employs a Universal Language Projection (ULP) module that integrates a learnable language token with acoustic representations, enabling language-aware processing through shared parameters. Experiments on diverse African languages show consistent word error rate reductions over strong multilingual baselines, highlighting improved robustness and generalization. Our code is available at the following GitHub URL: https://github.com/ilyes-okd/PUMA
2021
The SPECTRANS System Description for the WMT21 Terminology Task
Nicolas Ballier | Dahn Cho | Bilal Faye | Zong-You Ke | Hanna Martikainen | Mojca Pecman | Guillaume Wisniewski | Jean-Baptiste Yunès | Lichao Zhu | Maria Zimina-Poirot
Proceedings of the Sixth Conference on Machine Translation
Nicolas Ballier | Dahn Cho | Bilal Faye | Zong-You Ke | Hanna Martikainen | Mojca Pecman | Guillaume Wisniewski | Jean-Baptiste Yunès | Lichao Zhu | Maria Zimina-Poirot
Proceedings of the Sixth Conference on Machine Translation
This paper discusses the WMT 2021 terminology shared task from a “meta” perspective. We present the results of our experiments using the terminology dataset and the OpenNMT (Klein et al., 2017) and JoeyNMT (Kreutzer et al., 2019) toolkits for the language direction English to French. Our experiment 1 compares the predictions of the two toolkits. Experiment 2 uses OpenNMT to fine-tune the model. We report our results for the task with the evaluation script but mostly discuss the linguistic properties of the terminology dataset provided for the task. We provide evidence of the importance of text genres across scores, having replicated the evaluation scripts.