Abstract
This paper deals with translation of English documents to Oromo using statistical methods. Whereas English is the lingua franca of online information, Oromo, despite its relative wide distribution within Ethiopia and neighbouring countries like Kenya and Somalia, is one of the most resource scarce languages. The paper has two main goals: one is to test how far we can go with the available limited parallel corpus for the English ― Oromo language pair and the applicability of existing Statistical Machine Translation (SMT) systems on this language pair. The second goal is to analyze the output of the system with the objective of identifying the challenges that need to be tackled. Since the language is resource scarce as mentioned above, we cannot get as many parallel documents as we want for the experiment. However, using a limited corpus of 20,000 bilingual sentences and 163,000 monolingual sentences, translation accuracy in terms of BLEU Score of 17.74% was achieved.- Anthology ID:
- L10-1470
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/683_Paper.pdf
- DOI:
- Cite (ACL):
- Sisay Adugna and Andreas Eisele. 2010. English — Oromo Machine Translation: An Experiment Using a Statistical Approach. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- English — Oromo Machine Translation: An Experiment Using a Statistical Approach (Adugna & Eisele, LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/683_Paper.pdf