Unsupervised Machine Translation in Real-World Scenarios
Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka, Maite Melero
Abstract
In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning. In the course of the project 18 monolingual corpora for specific domains and languages have been collected, and 12 bilingual dictionaries and translation models have been generated. As part of the research, the unsupervised MT methodology based only on monolingual corpora (Artetxe et al., 2017) has been tested on a variety of languages and domains. Results show that in specialised domains, when there is enough monolingual in-domain data, unsupervised results are comparable to those of general domain supervised translation, and that, at any rate, unsupervised techniques can be used to boost results whenever very little data is available.- Anthology ID:
- 2022.lrec-1.325
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 3038–3047
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.325
- DOI:
- Cite (ACL):
- Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka, and Maite Melero. 2022. Unsupervised Machine Translation in Real-World Scenarios. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3038–3047, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Unsupervised Machine Translation in Real-World Scenarios (de Gibert Bonet et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.lrec-1.325.pdf