Abstract
An aligned corpus is an important resource for developing machine translation systems. We consider suitable units for constructing the translation model through observing an aligned parallel corpus. We examine the characteristics of the aligned corpus. Long sentences are especially difficult for word alignment because the sentences can become very complicated. Also, each (source/target) word has a higher possibility to correspond to the (target/source) word. This paper introduces an alignment viewer a developer can use to correct alignment information. We discuss using the viewer on a patent parallel corpus because sentences in patents are often long and complicated.- Anthology ID:
- 2005.mtsummit-posters.15
- Volume:
- Proceedings of Machine Translation Summit X: Posters
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 427–431
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-posters.15
- DOI:
- Cite (ACL):
- Hideki Kashioka. 2005. Word Alignment Viewer for Long Sentences. In Proceedings of Machine Translation Summit X: Posters, pages 427–431, Phuket, Thailand.
- Cite (Informal):
- Word Alignment Viewer for Long Sentences (Kashioka, MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2005.mtsummit-posters.15.pdf