Abstract
We propose a two-stage system for spoken language machine translation. In the first stage, the source sentence is parsed and paraphrased into an intermediate language which retains the words in the source language but follows the word order of the target language as much as feasible. This stage is mostly linguistic. In the second stage, a statistical MT is performed to translate the intermediate language into the target language. For the task of English-to-Mandarin translation, we achieved a 2.5 increase in BLEU score and a 45% decrease in GIZA-Alignment Crossover, on IWSLT-06 data. In a human evaluation of the sentences that differed, the two-stage system was preferred three times as often as the baseline.- Anthology ID:
- 2008.amta-papers.21
- Volume:
- Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 21-25
- Year:
- 2008
- Address:
- Waikiki, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 222–231
- Language:
- URL:
- https://aclanthology.org/2008.amta-papers.21
- DOI:
- Cite (ACL):
- Yushi Xu and Stephanie Seneff. 2008. Two-Stage Translation: A Combined Linguistic and Statistical Machine Translation Framework. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 222–231, Waikiki, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Two-Stage Translation: A Combined Linguistic and Statistical Machine Translation Framework (Xu & Seneff, AMTA 2008)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2008.amta-papers.21.pdf