Word Alignment Viewer for Long Sentences

Hideki Kashioka


Abstract
An aligned corpus is an important resource for developing machine translation systems. We consider suitable units for constructing the translation model through observing an aligned parallel corpus. We examine the characteristics of the aligned corpus. Long sentences are especially difficult for word alignment because the sentences can become very complicated. Also, each (source/target) word has a higher possibility to correspond to the (target/source) word. This paper introduces an alignment viewer a developer can use to correct alignment information. We discuss using the viewer on a patent parallel corpus because sentences in patents are often long and complicated.
Anthology ID:
2005.mtsummit-posters.15
Volume:
Proceedings of Machine Translation Summit X: Posters
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
427–431
Language:
URL:
https://aclanthology.org/2005.mtsummit-posters.15
DOI:
Bibkey:
Cite (ACL):
Hideki Kashioka. 2005. Word Alignment Viewer for Long Sentences. In Proceedings of Machine Translation Summit X: Posters, pages 427–431, Phuket, Thailand.
Cite (Informal):
Word Alignment Viewer for Long Sentences (Kashioka, MTSummit 2005)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2005.mtsummit-posters.15.pdf