The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education

Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post


Abstract
This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE). We participated in all five language tasks, placing first in each. Our approach involved a language-agnostic pipeline of three components: (1) building strong machine translation systems on general-domain data, (2) fine-tuning on Duolingo-provided data, and (3) generating n-best lists which are then filtered with various score-based techniques. In addi- tion to the language-agnostic pipeline, we attempted a number of linguistically-motivated approaches, with, unfortunately, little success. We also find that improving BLEU performance of the beam-search generated translation does not necessarily improve on the task metric—weighted macro F1 of an n-best list.
Anthology ID:
2020.ngt-1.22
Volume:
Proceedings of the Fourth Workshop on Neural Generation and Translation
Month:
July
Year:
2020
Address:
Online
Editors:
Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Kenneth Heafield, Marcin Junczys-Dowmunt, Ioannis Konstas, Xian Li, Graham Neubig, Yusuke Oda
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
188–197
Language:
URL:
https://aclanthology.org/2020.ngt-1.22
DOI:
10.18653/v1/2020.ngt-1.22
Bibkey:
Cite (ACL):
Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, and Matt Post. 2020. The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 188–197, Online. Association for Computational Linguistics.
Cite (Informal):
The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (Khayrallah et al., NGT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2020.ngt-1.22.pdf
Video:
 http://slideslive.com/38929836
Data
Duolingo STAPLE Shared Task