Feature-Based Decipherment for Machine Translation

Iftekhar Naim; Parker Riley; Daniel Gildea

doi:10.1162/coli_a_00326

Feature-Based Decipherment for Machine Translation

Iftekhar Naim, Parker Riley, Daniel Gildea

Abstract

Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.

Anthology ID:: J18-3006
Volume:: Computational Linguistics, Volume 44, Issue 3 - September 2018
Month:: September
Year:: 2018
Address:: Cambridge, MA
Venue:: CL
SIG:
Publisher:: MIT Press
Note:
Pages:: 525–546
Language:
URL:: https://aclanthology.org/J18-3006
DOI:: 10.1162/coli_a_00326
Bibkey:
Cite (ACL):: Iftekhar Naim, Parker Riley, and Daniel Gildea. 2018. Feature-Based Decipherment for Machine Translation. Computational Linguistics, 44(3):525–546.
Cite (Informal):: Feature-Based Decipherment for Machine Translation (Naim et al., CL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/J18-3006.pdf

PDF Search