Abstract
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are re-ordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.- Anthology ID:
- L12-1163
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2164–2171
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/340_Paper.pdf
- DOI:
- Cite (ACL):
- Amit Sangodkar and Om Damani. 2012. Re-ordering Source Sentences for SMT. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2164–2171, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Re-ordering Source Sentences for SMT (Sangodkar & Damani, LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/340_Paper.pdf