FFSTC 2: Extending the Fongbe to French Speech Translation Corpus

D. Fortuné KPONOU, Salima Mdhaffar, Fréjus A. A. Laleye, Eugène Cokou Ezin, Yannick Estève


Abstract
This paper introduced FFSTC 2, an expanded version of the existing Fongbe-to-French speech translation corpus, addressing the critical need for resources in African dialects for speech recognition and translation tasks. We extended the dataset by adding 36 hours of transcribed audio, bringing the total to 61 hours, thereby enhancing its utility for both automatic speech recognition (ASR) and speech translation (ST) in Fongbe, a low-resource language. Using this enriched corpus, we developed both cascade and end-to-end speech translation systems. Our models employ AfriHuBERT and HuBERT147, two speech encoders specialized to African languages, and the NLLB and mBART models as decoders. We also investigate the use of the SAMU-XLSR approach to inject sentence-level semantic information to the XSLR-128 model used as an alternative speech encoder. We also introduced a novel diacritic-substitution technique for ASR, which, when combined with NLLB, enables a cascade model to achieve a BLEU score of 37.23 ompared to 39.60 obtained by the best system using original diacritics. Among the end-to-end architectures evaluated, the architectures with data augmentation and NLLB as decoder achieved the highest score respectively, SAMU-NLLB scored the BLEU score of 28.43.
Anthology ID:
2025.iwslt-1.13
Volume:
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:
IWSLT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
145–152
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.13/
DOI:
Bibkey:
Cite (ACL):
D. Fortuné KPONOU, Salima Mdhaffar, Fréjus A. A. Laleye, Eugène Cokou Ezin, and Yannick Estève. 2025. FFSTC 2: Extending the Fongbe to French Speech Translation Corpus. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 145–152, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):
FFSTC 2: Extending the Fongbe to French Speech Translation Corpus (Fortuné KPONOU et al., IWSLT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.13.pdf