Marina Sanchez


2022

pdf
Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet | Iakes Goenaga | Jordi Armengol-Estapé | Olatz Perez-de-Viñaspre | Carla Parra Escartín | Marina Sanchez | Mārcis Pinnis | Gorka Labaka | Maite Melero
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning. In the course of the project 18 monolingual corpora for specific domains and languages have been collected, and 12 bilingual dictionaries and translation models have been generated. As part of the research, the unsupervised MT methodology based only on monolingual corpora (Artetxe et al., 2017) has been tested on a variety of languages and domains. Results show that in specialised domains, when there is enough monolingual in-domain data, unsupervised results are comparable to those of general domain supervised translation, and that, at any rate, unsupervised techniques can be used to boost results whenever very little data is available.