Sileye O. Ba


2020

pdf
SENCORPUS: A French-Wolof Parallel Corpus
Elhadji Mamadou Nguer | Alla Lo | Cheikh M. Bamba Dione | Sileye O. Ba | Moussa Lo
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we report efforts towards the acquisition and construction of a bilingual parallel corpus between French and Wolof, a Niger-Congo language belonging to the Northern branch of the Atlantic group. The corpus is constructed as part of the SYSNET3LOc project. It currently contains about 70,000 French-Wolof parallel sentences drawn on various sources from different domains. The paper discusses the data collection procedure, conversion, and alignment of the corpus as well as it’s application as training data for neural machine translation. In fact, using this corpus, we were able to create word embedding models for Wolof with relatively good results. Currently, the corpus is being used to develop a neural machine translation model to translate French sentences into Wolof.