Shuyue Stella Li


A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extracters
Shuyue Stella Li | Beining Xu | Xiangyu Zhang | Hexin Liu | Wenhan Chao | Paola Garcia
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

Learning from Mistakes: Towards Robust Neural Machine Translation for Disfluent L2 Sentences
Shuyue Stella Li | Philipp Koehn
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

We study the sentences written by second-language (L2) learners to improve the robustness of current neural machine translation (NMT) models on this type of data. Current large datasets used to train NMT systems are mostly Wikipedia or government documents written by highly competent speakers of that language, especially English. However, given that English is the most common second language, it is crucial that machine translation systems are robust against the large number of sentences written by L2 learners of English. By studying the difficulties faced by humans in their L2 acquisition process, we are able to transfer such insights to machine translation systems to recover from source-side fluency variations. In this work, we create additional training data with artificial errors similar to mistakes made by L2 learners of various fluency levels to improve the quality of the machine translation system. We test our method in zero-shot settings on the JFLEG-es (English-Spanish) dataset. The quality of our machine translation system on disfluent sentences outperforms the baseline by 1.8 BLEU scores.