Abstract
Statistical machine translation (SMT) requires a parallel corpus between the source and target languages. Although a pivot-translation approach can be applied to a language pair that does not have a parallel corpus directly between them, it requires both source―pivot and pivot―target parallel corpora. We propose a novel approach to apply SMT to a resource-limited source language that has no parallel corpus but has only a word dictionary for the pivot language. The problems with dictionary-based translations lie in their ambiguity and incompleteness. The proposed method uses a word lattice representation of the pivot-language candidates and word lattice decoding to deal with the ambiguity; the lattice expansion is accomplished by using a pivot―target phrase translation table to compensate for the incompleteness. Our experimental evaluation showed that this approach is promising for applying SMT, even when a source-side parallel corpus is lacking.- Anthology ID:
- L12-1393
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3929–3932
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/677_Paper.pdf
- DOI:
- Cite (ACL):
- Takanori Kusumoto and Tomoyosi Akiba. 2012. Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3929–3932, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension (Kusumoto & Akiba, LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/677_Paper.pdf