Abstract
We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.- Anthology ID:
- 2002.amta-papers.14
- Volume:
- Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
- Month:
- October 8-12
- Year:
- 2002
- Address:
- Tiburon, USA
- Editor:
- Stephen D. Richardson
- Venue:
- AMTA
- SIG:
- Publisher:
- Springer
- Note:
- Pages:
- 135–144
- Language:
- URL:
- https://link.springer.com/chapter/10.1007/3-540-45820-4_14
- DOI:
- Cite (ACL):
- Robert C. Moore. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 135–144, Tiburon, USA. Springer.
- Cite (Informal):
- Fast and accurate sentence alignment of bilingual corpora (Moore, AMTA 2002)
- PDF:
- https://link.springer.com/chapter/10.1007/3-540-45820-4_14