Abstract
Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentations). However, in monologues, sentences tend to be long and complicated, which often causes problems for parsing and translation. Therefore, we need a suitable translation unit, rather than the sentence. We propose the clause as a unit for translation. To develop a speech-to-speech machine translation system for monologues based on the clause as the translation unit, we need a monologue parallel corpus with clause alignment. In this paper, we describe how to build a Japanese-English monologue parallel corpus with clauses aligned, and discuss the features of this corpus.- Anthology ID:
- 2003.mtsummit-papers.29
- Volume:
- Proceedings of Machine Translation Summit IX: Papers
- Month:
- September 23-27
- Year:
- 2003
- Address:
- New Orleans, USA
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2003.mtsummit-papers.29
- DOI:
- Cite (ACL):
- Hideki Kashioka, Takehiko Maruyama, and Hideki Tanaka. 2003. Building a parallel corpus for monologues with clause alignment. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
- Cite (Informal):
- Building a parallel corpus for monologues with clause alignment (Kashioka et al., MTSummit 2003)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2003.mtsummit-papers.29.pdf