Large aligned treebanks for syntax-based machine translation
Gideon Kotzé, Vincent Vandeghinste, Scott Martens, Jörg Tiedemann
Abstract
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present evaluation scores of both the nonterminal constituent alignments and the MT system itself, and in the latter case, compare them with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.- Anthology ID:
- L12-1553
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 467–473
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/924_Paper.pdf
- DOI:
- Cite (ACL):
- Gideon Kotzé, Vincent Vandeghinste, Scott Martens, and Jörg Tiedemann. 2012. Large aligned treebanks for syntax-based machine translation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 467–473, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Large aligned treebanks for syntax-based machine translation (Kotzé et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/924_Paper.pdf