A social media NMT engine for a low-resource language combination

María Do Campo Bayón, Pilar Sánchez-Gijón


Abstract
The aim of this article is to present a new Neural Machine Translation (NMT) from Spanish into Galician for the social media domain that was trained with a Twitter corpus. Our main goal is to outline the methods used to build the corpus and the steps taken to train the engine in a low-resource language context. We have evalu-ated the engine performance both with regular automatic metrics and with a new methodology based on the non-inferiority process and contrasted this information with an error classification human evalua-tion conducted by professional linguists. We will present the steps carried out fol-lowing the conclusions of a previous pilot study, describe the new process followed, analyze the new engine and present the final conclusions.
Anthology ID:
2023.eamt-1.26
Volume:
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Month:
June
Year:
2023
Address:
Tampere, Finland
Editors:
Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
269–274
Language:
URL:
https://aclanthology.org/2023.eamt-1.26
DOI:
Bibkey:
Cite (ACL):
María Do Campo Bayón and Pilar Sánchez-Gijón. 2023. A social media NMT engine for a low-resource language combination. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 269–274, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):
A social media NMT engine for a low-resource language combination (Bayón & Sánchez-Gijón, EAMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2023.eamt-1.26.pdf