Meaningless yet meaningful: Morphology grounded subword-level NMT

Tamali Banerjee; Pushpak Bhattacharyya

doi:10.18653/v1/W18-1207

Meaningless yet meaningful: Morphology grounded subword-level NMT

Abstract

We explore the use of two independent subsystems Byte Pair Encoding (BPE) and Morfessor as basic units for subword-level neural machine translation (NMT). We show that, for linguistically distant language-pairs Morfessor-based segmentation algorithm produces significantly better quality translation than BPE. However, for close language-pairs BPE-based subword-NMT may translate better than Morfessor-based subword-NMT. We propose a combined approach of these two segmentation algorithms Morfessor-BPE (M-BPE) which outperforms these two baseline systems in terms of BLEU score. Our results are supported by experiments on three language-pairs: English-Hindi, Bengali-Hindi and English-Bengali.

Anthology ID:: W18-1207
Volume:: Proceedings of the Second Workshop on Subword/Character LEvel Models
Month:: June
Year:: 2018
Address:: New Orleans
Editors:: Manaal Faruqui, Hinrich Schütze, Isabel Trancoso, Yulia Tsvetkov, Yadollah Yaghoobzadeh
Venue:: SCLeM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 55–60
Language:
URL:: https://aclanthology.org/W18-1207
DOI:: 10.18653/v1/W18-1207
Bibkey:
Cite (ACL):: Tamali Banerjee and Pushpak Bhattacharyya. 2018. Meaningless yet meaningful: Morphology grounded subword-level NMT. In Proceedings of the Second Workshop on Subword/Character LEvel Models, pages 55–60, New Orleans. Association for Computational Linguistics.
Cite (Informal):: Meaningless yet meaningful: Morphology grounded subword-level NMT (Banerjee & Bhattacharyya, SCLeM 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/W18-1207.pdf

PDF Search