Henry Li Xinyuan

Also published as: Henry Li Xinyuan


2024

pdf
JHU IWSLT 2024 Dialectal and Low-resource System Description
Nathaniel Romney Robinson | Kaiser Sun | Cihan Xiao | Niyati Bafna | Weiting Tan | Haoran Xu | Henry Li Xinyuan | Ankur Kejriwal | Sanjeev Khudanpur | Kenton Murray | Paul McNamee
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

Johns Hopkins University (JHU) submitted systems for all eight language pairs in the 2024 Low-Resource Language Track. The main effort of this work revolves around fine-tuning large and publicly available models in three proposed systems: i) end-to-end speech translation (ST) fine-tuning of Seamless4MT v2; ii) ST fine-tuning of Whisper; iii) a cascaded system involving automatic speech recognition with fine-tuned Whisper and machine translation with NLLB. On top of systems above, we conduct a comparative analysis on different training paradigms, such as intra-distillation for NLLB as well as joint training and curriculum learning for SeamlessM4T v2. Our results show that the best-performing approach differs by language pairs, but that i) fine-tuned SeamlessM4T v2 tends to perform best for source languages on which it was pre-trained, ii) multi-task training helps Whisper fine-tuning, iii) cascaded systems with Whisper and NLLB tend to outperform Whisper alone, and iv) intra-distillation helps NLLB fine-tuning.

2023

pdf
JHU IWSLT 2023 Multilingual Speech Translation System Description
Henry Li Xinyuan | Neha Verma | Bismarck Bamfo Odoom | Ujvala Pradeep | Matthew Wiesner | Sanjeev Khudanpur
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

We describe the Johns Hopkins ACL 60-60 Speech Translation systems submitted to the IWSLT 2023 Multilingual track, where we were tasked to translate ACL presentations from English into 10 languages. We developed cascaded speech translation systems for both the constrained and unconstrained subtracks. Our systems make use of pre-trained models as well as domain-specific corpora for this highly technical evaluation-only task. We find that the specific technical domain which ACL presentations fall into presents a unique challenge for both ASR and MT, and we present an error analysis and an ACL-specific corpus we produced to enable further work in this area.