Building a parallel corpus for monologues with clause alignment

Hideki Kashioka, Takehiko Maruyama, Hideki Tanaka


Abstract
Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentations). However, in monologues, sentences tend to be long and complicated, which often causes problems for parsing and translation. Therefore, we need a suitable translation unit, rather than the sentence. We propose the clause as a unit for translation. To develop a speech-to-speech machine translation system for monologues based on the clause as the translation unit, we need a monologue parallel corpus with clause alignment. In this paper, we describe how to build a Japanese-English monologue parallel corpus with clauses aligned, and discuss the features of this corpus.
Anthology ID:
2003.mtsummit-papers.29
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.29
DOI:
Bibkey:
Cite (ACL):
Hideki Kashioka, Takehiko Maruyama, and Hideki Tanaka. 2003. Building a parallel corpus for monologues with clause alignment. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
Building a parallel corpus for monologues with clause alignment (Kashioka et al., MTSummit 2003)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/2003.mtsummit-papers.29.pdf