Abstract
This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings.- Anthology ID:
- L08-1134
- Volume:
- Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
- Month:
- May
- Year:
- 2008
- Address:
- Marrakech, Morocco
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/549_paper.pdf
- DOI:
- Cite (ACL):
- Saša Hasan and Hermann Ney. 2008. A Multi-Genre SMT System for Arabic to French. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
- Cite (Informal):
- A Multi-Genre SMT System for Arabic to French (Hasan & Ney, LREC 2008)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/549_paper.pdf