Building an Annotated Japanese-Chinese Parallel Corpus – A Part of NICT Multilingual Corpora

Yujie Zhang, Kiyotaka Uchimoto, Qing Ma, Hitoshi Isahara


Abstract
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Corpora. The corpus is general domain, of large scale of about 40,000 sentence pairs, long sentences, annotated with detailed information and high quality. To the best of our knowledge, this will be the first annotated Japanese-Chinese parallel corpus in the world. We created the corpus by selecting Japanese sentences from Mainichi Newspaper and then manually translating them into Chinese. We then annotated the corpus with morphological and syntactic structures and alignments at word and phrase levels. This paper describes the specification in human translation and detailed information annotation, and the tools we developed in the project. The experience we obtained and points we paid special attentions are also introduced for share with other researches in corpora construction.
Anthology ID:
2005.mtsummit-papers.10
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
71–78
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.10
DOI:
Bibkey:
Cite (ACL):
Yujie Zhang, Kiyotaka Uchimoto, Qing Ma, and Hitoshi Isahara. 2005. Building an Annotated Japanese-Chinese Parallel Corpus – A Part of NICT Multilingual Corpora. In Proceedings of Machine Translation Summit X: Papers, pages 71–78, Phuket, Thailand.
Cite (Informal):
Building an Annotated Japanese-Chinese Parallel Corpus – A Part of NICT Multilingual Corpora (Zhang et al., MTSummit 2005)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2005.mtsummit-papers.10.pdf