Abstract
The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model for dependency structures that is relational rather than configurational and thus particularly suited for languages with a (relatively) free word order. It is trainable with Neural Networks, and not only improves over standard n-gram language models, but also outperforms related syntactic language models. We empirically demonstrate its effectiveness in terms of perplexity and as a feature function in string-to-tree SMT from English to German and Russian. We also show that using a syntactic evaluation metric to tune the log-linear parameters of an SMT system further increases translation quality when coupled with a syntactic language model.- Anthology ID:
- Q15-1013
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 3
- Month:
- Year:
- 2015
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 169–182
- Language:
- URL:
- https://aclanthology.org/Q15-1013
- DOI:
- 10.1162/tacl_a_00131
- Cite (ACL):
- Rico Sennrich. 2015. Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation. Transactions of the Association for Computational Linguistics, 3:169–182.
- Cite (Informal):
- Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation (Sennrich, TACL 2015)
- PDF:
- https://preview.aclanthology.org/author-url/Q15-1013.pdf
- Data
- WMT 2014