Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski


Abstract
This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English. We show that, contrary to what could be expected, PBSMT outperforms NMT when translating non-canonical inputs. Our error analysis uncovers the specificities of UGC that are problematic for sequential NMT architectures and suggests new avenue for improving NMT models.
Anthology ID:
W19-6101
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
2–14
Language:
URL:
https://aclanthology.org/W19-6101
DOI:
Bibkey:
Cite (ACL):
José Carlos Rosales Núñez, Djamé Seddah, and Guillaume Wisniewski. 2019. Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 2–14, Turku, Finland. Linköping University Electronic Press.
Cite (Informal):
Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content (Rosales Núñez et al., NoDaLiDa 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/W19-6101.pdf
Data
MTNT