Error Analysis of Statistical Machine Translation Output

David Vilar, Jia Xu, Luis Fernando D’Haro, Hermann Ney


Abstract
Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification of the most prominent source of errors in a given system using these measures alone is not possible. Therefore some analysis of the generated translations is needed in order to identify the main problems and to focus the research efforts. This area is however mostly unexplored and few works have dealt with it until now. In this paper we will present a framework for classification of the errors of a machine translation system and we will carry out an error analysis of the system used by the RWTH in the first TC-STAR evaluation.
Anthology ID:
L06-1244
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/413_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
David Vilar, Jia Xu, Luis Fernando D’Haro, and Hermann Ney. 2006. Error Analysis of Statistical Machine Translation Output. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Error Analysis of Statistical Machine Translation Output (Vilar et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/413_pdf.pdf