2025
pdf
bib
abs
From English to Second Language Mastery: Enhancing LLMs with Cross-Lingual Continued Instruction Tuning
Linjuan Wu
|
Hao-Ran Wei
|
Baosong Yang
|
Weiming Lu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Supervised Fine-Tuning (SFT) with translated instruction data effectively adapts Large Language Models (LLMs) from English to non-English languages. We introduce Cross-Lingual Continued Instruction Tuning (X-CIT), which fully leverages translation-based parallel instruction data to enhance cross-lingual adaptability. X-CIT emulates the human process of second language acquisition and is guided by Chomsky’s Principles and Parameters Theory. It first fine-tunes the LLM on English instruction data to establish foundational capabilities (i.e. Principles), then continues with target language translation and customized chat-instruction data to adjust “parameters” specific to the target language. This chat-instruction data captures alignment information in translated parallel data, guiding the model to initially think and respond in its native language before transitioning to the target language. To further mimic human learning progression, we incorporate Self-Paced Learning (SPL) during continued training, allowing the model to advance from simple to complex tasks. Implemented on Llama-2-7B across five languages, X-CIT was evaluated against three objective benchmarks and an LLM-as-a-judge benchmark, improving the strongest baseline by an average of 1.97% and 8.2% in these two benchmarks, respectively.
pdf
bib
abs
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang
|
Yongliang Shen
|
Zhe Zheng
|
Linjuan Wu
|
Wenqi Zhang
|
Yuchen Yan
|
Qiuying Peng
|
Jun Wang
|
Weiming Lu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have demonstrated remarkable capabilities in tool learning. In real-world scenarios, user queries are often ambiguous and incomplete, requiring effective clarification. However, existing interactive clarification approaches face two critical limitations: reliance on manually constructed datasets, which inherently constrains training data scale and diversity, and lack of error correction mechanisms during multi-turn clarification, leading to error accumulation that compromises both accuracy and efficiency. We present AskToAct, which addresses these challenges by exploiting the structural mapping between queries and their tool invocation solutions. Our key insight is that tool parameters naturally represent explicit user intents. By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data. We further enhance model robustness through error-correction pairs and selective masking, enabling dynamic error detection during clarification interactions. Comprehensive experiments demonstrate that AskToAct significantly outperforms existing approaches, achieving above 57% accuracy in recovering critical unspecified intents and enhancing clarification efficiency by an average of 10.46% while maintaining high accuracy in tool invocation. Our framework exhibits robust performance across different model architectures and successfully generalizes to entirely unseen APIs without additional training, achieving performance comparable to GPT-4o with substantially fewer computational resources.
pdf
bib
abs
Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training
Linjuan Wu
|
Hao-Ran Wei
|
Huan Lin
|
Tianhao Li
|
Baosong Yang
|
Fei Huang
|
Weiming Lu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) exhibit remarkable multilingual capabilities despite English-dominated pre-training, attributed to cross-lingual mechanisms during pre-training. Existing methods for enhancing cross-lingual transfer remain constrained by parallel resources, suffering from limited linguistic and domain coverage. We propose Cross-lingual In-context Pre-training (CrossIC-PT), a simple and scalable approach that enhances cross-lingual transfer by leveraging semantically related bilingual texts via simple next-word prediction. We construct CrossIC-PT samples by interleaving semantic-related bilingual Wikipedia documents into a single context window. To access window size constraints, we implement a systematic segmentation policy to split long bilingual document pairs into chunks while adjusting the sliding window mechanism to preserve contextual coherence. We further extend data availability through a semantic retrieval framework to construct CrossIC-PT samples from web-crawled corpus. Experimental results demonstrate that CrossIC-PT improves multilingual performance on three models (Llama-3.1-8B, Qwen2.5-7B, and Qwen2.5-1.5B) across six target languages, yielding performance gains of 3.79%, 3.99%, and 1.95%, respectively, with additional improvements after data augmentation.
2024
pdf
bib
abs
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Wenqi Zhang
|
Yongliang Shen
|
Linjuan Wu
|
Qiuying Peng
|
Jun Wang
|
Yueting Zhuang
|
Weiming Lu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM’s response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM’s intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
pdf
bib
abs
TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models’ Theory-of-Mind
Guiyang Hou
|
Wenqi Zhang
|
Yongliang Shen
|
Linjuan Wu
|
Weiming Lu
Findings of the Association for Computational Linguistics: ACL 2024
Theory of Mind (ToM)—the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction. Although ToM comes naturally to humans, it poses a significant challenge to even the most advanced Large Language Models (LLMs). Due to the complex logical chains in ToM reasoning, especially in higher-order ToM questions, simply utilizing reasoning methods like Chain of Thought (CoT) will not improve the ToM capabilities of LLMs. We present TimeToM, which constructs a temporal space and uses it as the foundation to improve the ToM capabilities of LLMs in multiple scenarios. Specifically, within the temporal space, we construct Temporal Belief State Chain (TBSC) for each character and inspired by the cognition perspective of the social world model, we divide TBSC into self-world beliefs and social world beliefs, aligning with first-order ToM (first-order beliefs) and higher-order ToM (higher-order beliefs) questions, respectively. Moreover, we design a novel tool-belief solver that, by considering belief communication between characters in temporal space, can transform a character’s higher-order beliefs into another character’s first-order beliefs under belief communication period.
2023
pdf
bib
abs
Struct-XLM: A Structure Discovery Multilingual Language Model for Enhancing Cross-lingual Transfer through Reinforcement Learning
Linjuan Wu
|
Weiming Lu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Cross-lingual transfer learning heavily relies on well-aligned cross-lingual representations. The syntactic structure is recognized as beneficial for cross-lingual transfer, but limited researches utilize it for aligning representation in multilingual pre-trained language models (PLMs). Additionally, existing methods require syntactic labels that are difficult to obtain and of poor quality for low-resource languages. To address this gap, we propose Struct-XLM, a novel multilingual language model that leverages reinforcement learning (RL) to autonomously discover universal syntactic structures for improving the cross-lingual representation alignment of PLM. Struct-XLM integrates a policy network (PNet) and a translation ranking task. The PNet is designed to discover structural information and integrate it into the last layer of the PLM through the structural multi-head attention module to obtain structural representation. The translation ranking task obtains a delayed reward based on the structural representation to optimize the PNet while improving the alignment of cross-lingual representation. Experiments show the effectiveness of the proposed approach for enhancing cross-lingual transfer of multilingual PLM on the XTREME benchmark.
pdf
bib
abs
Good Meta-tasks Make A Better Cross-lingual Meta-transfer Learning for Low-resource Languages
Linjuan Wu
|
Zongyi Guo
|
Baoliang Cui
|
Haihong Tang
|
Weiming Lu
Findings of the Association for Computational Linguistics: EMNLP 2023
Model-agnostic meta-learning has garnered attention as a promising technique for enhancing few-shot cross-lingual transfer learning in low-resource scenarios. However, little attention was paid to the impact of data selection strategies on this cross-lingual meta-transfer method, particularly the sampling of cross-lingual meta-training data (i.e. meta-tasks) at the syntactic level to reduce language gaps. In this paper, we propose a Meta-Task Collector-based Cross-lingual Meta-Transfer framework (MeTaCo-XMT) to adapt different data selection strategies to construct meta-tasks for meta-transfer learning. Syntactic differences have an effect on transfer performance, so we consider a syntactic similarity sampling strategy and propose a syntactic distance metric model consisting of a syntactic encoder block based on the pre-trained model and a distance metric block using Word Move’s Distance (WMD). Additionally, we conduct experiments with three different data selection strategies to instantiate our framework and analyze their performance impact. Experimental results on two multilingual NLP datasets, Wikiann and TydiQA, demonstrate the significant superiority of our approach compared to existing strong baselines.
2022
pdf
bib
abs
Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension
Linjuan Wu
|
Shaojuan Wu
|
Xiaowang Zhang
|
Deyi Xiong
|
Shizhan Chen
|
Zhiqiang Zhuang
|
Zhiyong Feng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multilingual pre-trained models are able to zero-shot transfer knowledge from rich-resource to low-resource languages in machine reading comprehension (MRC). However, inherent linguistic discrepancies in different languages could make answer spans predicted by zero-shot transfer violate syntactic constraints of the target language. In this paper, we propose a novel multilingual MRC framework equipped with a Siamese Semantic Disentanglement Model (S2DM) to disassociate semantics from syntax in representations learned by multilingual pre-trained models. To explicitly transfer only semantic knowledge to the target language, we propose two groups of losses tailored for semantic and syntactic encoding and disentanglement. Experimental results on three multilingual MRC datasets (i.e., XQuAD, MLQA, and TyDi QA) demonstrate the effectiveness of our proposed approach over models based on mBERT and XLM-100.