Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017

Satoshi Kinoshita, Tadaaki Oshio, Tomoharu Mitsuhashi


Abstract
Japio participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about 50 million and 10 million sentence pairs respectively achieved comparable scores for automatic evaluations, but NMT systems were superior to SMT systems for both official and in-house human evaluations.
Anthology ID:
W17-5713
Volume:
Proceedings of the 4th Workshop on Asian Translation (WAT2017)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Venue:
WAT
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
140–145
Language:
URL:
https://aclanthology.org/W17-5713
DOI:
Bibkey:
Cite (ACL):
Satoshi Kinoshita, Tadaaki Oshio, and Tomoharu Mitsuhashi. 2017. Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), pages 140–145, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017 (Kinoshita et al., WAT 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/W17-5713.pdf