Fast and accurate sentence alignment of bilingual corpora

Robert C. Moore


Abstract
We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.
Anthology ID:
2002.amta-papers.14
Volume:
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 8-12
Year:
2002
Address:
Tiburon, USA
Editor:
Stephen D. Richardson
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
135–144
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-45820-4_14
DOI:
Bibkey:
Cite (ACL):
Robert C. Moore. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 135–144, Tiburon, USA. Springer.
Cite (Informal):
Fast and accurate sentence alignment of bilingual corpora (Moore, AMTA 2002)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-45820-4_14