TDDC: Timely Disclosure Documents Corpus

Nobushige Doi, Yusuke Oda, Toshiaki Nakazawa


Abstract
In this paper, we describe the details of the Timely Disclosure Documents Corpus (TDDC). TDDC was prepared by manually aligning the sentences from past Japanese and English timely disclosure documents in PDF format published by companies listed on the Tokyo Stock Exchange. TDDC consists of approximately 1.4 million parallel sentences in Japanese and English. TDDC was used as the official dataset for the 6th Workshop on Asian Translation to encourage the development of machine translation.
Anthology ID:
2020.lrec-1.459
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3719–3726
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.459
DOI:
Bibkey:
Cite (ACL):
Nobushige Doi, Yusuke Oda, and Toshiaki Nakazawa. 2020. TDDC: Timely Disclosure Documents Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3719–3726, Marseille, France. European Language Resources Association.
Cite (Informal):
TDDC: Timely Disclosure Documents Corpus (Doi et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.459.pdf