NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations

Helena Caseli, Marcio Inácio


Abstract
Machine Translation (MT) is one of the most important natural language processing applications. Independently of the applied MT approach, a MT system automatically generates an equivalent version (in some target language) of an input sentence (in some source language). Recently, a new MT approach has been proposed: neural machine translation (NMT). NMT systems have already outperformed traditional phrase-based statistical machine translation (PBSMT) systems for some pairs of languages. However, any MT approach outputs errors. In this work we present a comparative study of MT errors generated by a NMT system and a PBSMT system trained on the same English – Brazilian Portuguese parallel corpus. This is the first study of this kind involving NMT for Brazilian Portuguese. Furthermore, the analyses and conclusions presented here point out the specific problems of NMT outputs in relation to PBSMT ones and also give lots of insights into how to implement automatic post-editing for a NMT system. Finally, the corpora annotated with MT errors generated by both PBSMT and NMT systems are also available.
Anthology ID:
2020.lrec-1.446
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3623–3629
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.446
DOI:
Bibkey:
Cite (ACL):
Helena Caseli and Marcio Inácio. 2020. NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3623–3629, Marseille, France. European Language Resources Association.
Cite (Informal):
NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations (Caseli & Inácio, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.446.pdf