Yuzhang Wu
2024
Translate-and-Revise: Boosting Large Language Models for Constrained Translation
Pengcheng Huang
|
Yongyu Mu
|
Yuzhang Wu
|
Bei Li
|
Chunyang Xiao
|
Tong Xiao
|
Zhu Jingbo
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Imposing constraints on machine translation systems presents a challenging issue because thesesystems are not trained to make use of constraints in generating adequate, fluent translations. Inthis paper, we leverage the capabilities of large language models (LLMs) for constrained trans-lation, given that LLMs can easily adapt to this task by taking translation instructions and con-straints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and,in some cases, ignore the given constraints. This is in part because LLMs might be overly confi-dent in their predictions, overriding the influence of the constraints. To overcome this overidingbehaviour, we propose to add a revision process that encourages LLMs to correct the outputs byprompting them about the constraints that have not yet been met. We evaluate our approach onfour constrained translation tasks, encompassing both lexical and structural constraints in mul-tiple constraint domains. Experiments show 15% improvement in constraint-based translationaccuracy over standard LLMs and the approach also significantly outperforms neural machinetranslation (NMT) state-of-the-art methods.IntroductionConstrained translation seeks to generate translations that adhere to pre-specified constraints. Toachieve this, conventional approaches impose constraints on machine translation systems and force themto follow the constraints during inference (Hokamp and Liu, 2017; Hasler et al., 2018; Dinu et al., 2019;Bergmanis and Pinnis, 2021b; Wang et al., 2022b; Ailem et al., 2022). More recently, large languagemodels (LLMs) have been shown to be strong translation systems (Hendy et al., 2023; Moslem et al.,2023). They provide a general way to involve various instructions, demonstrations, and constraints intothe translation process (Mu et al., 2023; Bogoychev and Chen, 2023), enabling us to perform constrainedtranslation using off-the-shelf, well-trained LLMs.”
Revealing the Parallel Multilingual Learning within Large Language Models
Yongyu Mu
|
Peinan Feng
|
Zhiquan Cao
|
Yuzhang Wu
|
Bei Li
|
Chenglong Wang
|
Tong Xiao
|
Kai Song
|
Tongran Liu
|
Chunliang Zhang
|
JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) can handle multilingual and cross-lingual text within a single input; however, previous works leveraging multilingualism in LLMs primarily focus on using English as the pivot language to enhance language understanding and reasoning. Given that multiple languages are a compensation for the losses caused by a single language’s limitations, it’s a natural next step to enrich the model’s learning context through the integration of the original input with its multiple translations. In this paper, we start by revealing that LLMs learn from parallel multilingual input (PMI). Our comprehensive evaluation shows that PMI enhances the model’s comprehension of the input, achieving superior performance than conventional in-context learning (ICL). Furthermore, to explore how multilingual processing affects prediction, we examine the activated neurons in LLMs. Surprisingly, involving more languages in the input activates fewer neurons, leading to more focused and effective neural activation patterns. Also, this neural reaction coincidently mirrors the neuroscience insight about synaptic pruning, highlighting a similarity between artificial and biological ‘brains’.