Building English-to-Serbian Machine Translation System for IMDb Movie Reviews

Pintu Lohar; Maja Popović; Andy Way

doi:10.18653/v1/W19-3715

Building English-to-Serbian Machine Translation System for IMDb Movie Reviews

Abstract

This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.

Anthology ID:: W19-3715
Volume:: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:: August
Year:: 2019
Address:: Florence, Italy
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–113
Language:
URL:: https://aclanthology.org/W19-3715
DOI:: 10.18653/v1/W19-3715
Bibkey:
Cite (ACL):: Pintu Lohar, Maja Popović, and Andy Way. 2019. Building English-to-Serbian Machine Translation System for IMDb Movie Reviews. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 105–113, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Building English-to-Serbian Machine Translation System for IMDb Movie Reviews (Lohar et al., BSNLP 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/W19-3715.pdf
Code: m-popovic/imdb-corpus-for-MT
Data: IMDb Movie Reviews

PDF Search Code