Vocabulary accuracy of statistical machine translation in the legal context

Jeffrey Killman


Abstract
This paper examines the accuracy of free online SMT output provided by Google Translate (GT) in the difficult context of legal translation. The paper analyzes English machine translations produced by GT for a large sample of Spanish legal vocabulary items that originate from a voluminous text of judgment summaries produced by the Supreme Court of Spain. Prior to this study, this same text was translated into English but without MT and it was found that the majority of the translation solutions that were chosen for the said vocabulary items could be hand-selected from mostly EU databases with versions in English and Spanish. The paper argues that MT in the legal translation context should be worthwhile if the output can consistently provide a reasonable amount of accurate translations of the types of vocabulary items translators in this context often have to do research on before being able to effectively translate them. Much of the currently available translated text used to train SMT comes from international organizations, such as the EU and the UN which often write about legal matters. Moreover, SMT can use the immediate co-text of vocabulary items as a way of attempting to identify correct translations in its database.
Anthology ID:
2014.amta-wptp.7
Volume:
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas
Month:
October 22-26
Year:
2014
Address:
Vancouver, Canada
Editors:
Sharon O'Brien, Michel Simard, Lucia Specia
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
85–98
Language:
URL:
https://aclanthology.org/2014.amta-wptp.7
DOI:
Bibkey:
Cite (ACL):
Jeffrey Killman. 2014. Vocabulary accuracy of statistical machine translation in the legal context. In Proceedings of the 11th Conference of the Association for Machine Translation in the Americas, pages 85–98, Vancouver, Canada. Association for Machine Translation in the Americas.
Cite (Informal):
Vocabulary accuracy of statistical machine translation in the legal context (Killman, AMTA 2014)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2014.amta-wptp.7.pdf