The MT@BZ corpus: machine translation & legal language
Flavia De Camillis, Egon W. Stemle, Elena Chiocchetti, Francesco Fernicola
Abstract
The paper reports on the creation, annotation and curation of the MT@BZ corpus, a bilingual (Italian–South Tyrolean German) corpus of machine-translated legal texts from the officially multilingual Province of Bolzano, Italy. It is the first human error-annotated corpus (using an adapted SCATE taxonomy) of machine-translated legal texts in this language combination that includes a lesser-used standard variety. The data of the project will be made available on GitHub and another repository. The output of the customized engine achieved notably better BLEU, TER and chrF2 scores than the baseline. Over 50% of the segments needed no human revision due to customization. The most frequent error categories were mistranslations and bilingual (legal) terminology errors. Our contribution brings fine-grained insights to Machine translation evaluation research, as it concerns a less common language combination, a lesser-used language variety and a societally relevant specialized domain. Such results are necessary to implement and inform the use of MT in institutional contexts of smaller language communities.- Anthology ID:
- 2023.eamt-1.17
- Volume:
- Proceedings of the 24th Annual Conference of the European Association for Machine Translation
- Month:
- June
- Year:
- 2023
- Address:
- Tampere, Finland
- Editors:
- Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 171–180
- Language:
- URL:
- https://aclanthology.org/2023.eamt-1.17
- DOI:
- Cite (ACL):
- Flavia De Camillis, Egon W. Stemle, Elena Chiocchetti, and Francesco Fernicola. 2023. The MT@BZ corpus: machine translation & legal language. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 171–180, Tampere, Finland. European Association for Machine Translation.
- Cite (Informal):
- The MT@BZ corpus: machine translation & legal language (De Camillis et al., EAMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.eamt-1.17.pdf