Neural machine translation (NMT) has advanced significantly, yet challenges remain in adapting to new domains . In scenarios where bilingual data is limited, this issue is further exacerbated. To address this, we propose kNN-LM-NMT, a method that leverages semantically similar target language sentences in the kNN framework. Our approach generates a probability distribution over these sentences during decoding, and this distribution is then interpolated with the NMT model’s distribution. Additionally, we introduce an n-gram-based approach to focus on similar fragments, enabling the model to avoid the noise introduced by the non-similar parts. To enhance accuracy, we further incorporate cross-lingual retrieval similarity to refine the kNN probability distribution. Extensive experiments on multi-domain datasets demonstrate significant performance improvements in both high-resource and low-resource scenarios. Our approach effectively extracts translation knowledge from limited target domain data, and well benefits from large-scale monolingual data for robust context representation.
Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers’ feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9× compared to the two-stage method.
Neural Machine Translation (NMT) encounters challenges when translating in new domains and low-resource languages. To address these issues, researchers have proposed methods to integrate additional knowledge into NMT, such as translation memories (TMs). However, finding TMs that closely match the input sentence remains challenging, particularly in specific domains. On the other hand, monolingual data is widely accessible in most languages, and back-translation is seen as a promising approach for utilizing target language data. Nevertheless, it still necessitates additional training. In this paper, we introduce Pseudo-kNN-MT, a variant of k-nearest neighbor machine translation (kNN-MT) that utilizes target language data by constructing a pseudo datastore. Furthermore, we investigate the utility of large language models (LLMs) for the kNN component. Experimental results demonstrate that our approach exhibits strong domain adaptation capability in both high-resource and low-resource machine translation. Notably, LLMs are found to be beneficial for robust NMT systems.