Jian Luan


2023

pdf
The Xiaomi AI Lab’s Speech Translation Systems for IWSLT 2023 Offline Task, Simultaneous Task and Speech-to-Speech Task
Wuwei Huang | Mengge Liu | Xiang Li | Yanzhi Tian | Fengyu Yang | Wen Zhang | Jian Luan | Bin Wang | Yuhang Guo | Jinsong Su
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This system description paper introduces the systems submitted by Xiaomi AI Lab to the three tracks of the IWSLT 2023 Evaluation Campaign, namely the offline speech translation (Offline-ST) track, the offline speech-to-speech translation (Offline-S2ST) track, and the simultaneous speech translation (Simul-ST) track. All our submissions for these three tracks only involve the English-Chinese language direction. Our English-Chinese speech translation systems are constructed using large-scale pre-trained models as the foundation. Specifically, we fine-tune these models’ corresponding components for various downstream speech translation tasks. Moreover, we implement several popular techniques, such as data filtering, data augmentation, speech segmentation, and model ensemble, to improve the system’s overall performance. Extensive experiments show that our systems achieve a significant improvement over the strong baseline systems in terms of the automatic evaluation metric.

pdf
Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation
Zhongjian Miao | Wen Zhang | Jinsong Su | Xiang Li | Jian Luan | Yidong Chen | Bin Wang | Min Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Conventional knowledge distillation(KD) approaches are commonly employed to compress neural machine translation(NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation(AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.

pdf
Exploring Better Text Image Translation with Multimodal Codebook
Zhibin Lan | Jiawei Yu | Xiang Li | Wen Zhang | Jian Luan | Bin Wang | Degen Huang | Jinsong Su
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value. However, current studies on TIT are confronted with two main bottlenecks: 1) this task lacks a publicly available TIT dataset, 2) dominant models are constructed in a cascaded manner, which tends to suffer from the error propagation of optical character recognition (OCR). In this work, we first annotate a Chinese-English TIT dataset named OCRMT30K, providing convenience for subsequent studies. Then, we propose a TIT model with a multimodal codebook, which is able to associate the image with relevant texts, providing useful supplementary information for translation. Moreover, we present a multi-stage training framework involving text machine translation, image-text alignment, and TIT tasks, which fully exploits additional bilingual texts, OCR dataset and our OCRMT30K dataset to train our model. Extensive experiments and in-depth analyses strongly demonstrate the effectiveness of our proposed model and training framework.

2022

pdf
BIT-Xiaomi’s System for AutoSimTrans 2022
Mengge Liu | Xiang Li | Bao Chen | Yanzhi Tian | Tianwei Lan | Silin Li | Yuhang Guo | Jian Luan | Bin Wang
Proceedings of the Third Workshop on Automatic Simultaneous Translation

This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.