Maolin Wang
2026
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search
Sheng Zhang | Junyi Li | Yingyi Zhang | Pengyue Jia | Yichao Wang | Xiaowei Qian | Wenlin Zhang | Maolin Wang | Yong Liu | Xiangyu Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sheng Zhang | Junyi Li | Yingyi Zhang | Pengyue Jia | Yichao Wang | Xiaowei Qian | Wenlin Zhang | Maolin Wang | Yong Liu | Xiangyu Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advances in large language models (LLMs) have scaled the potential for reasoning and agentic search, wherein models autonomously plan, retrieve, and reason over external knowledge to answer complex queries. However, the iterative think–search loop accumulates long system memories, leading to memory dilution problem. In addition, existing memory management methods struggle to capture fine-grained semantic relations between queries and documents and often lose substantial information. Therefore, we propose MemSearch-o1, an agentic search framework built on reasoning-aligned memory growth and retracing. MemSearch-o1 dynamically grows fine-grained memory fragments from memory seed tokens from the queries, then retraces and deeply refines the memory via a contribution function, and finally reorganizes a globally connected memory path. This shifts memory management from stream-like concatenation to structured, token-level growth with path-based reasoning. Experiments on eight benchmark datasets show that MemSearch-o1 substantially mitigates memory dilution, and more effectively activates the reasoning potential of diverse LLMs, establishing a solid foundation for memory-aware agentic intelligence.
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
Jingyu Peng | Maolin Wang | Nan Wang | Jiatong Li | Yuchen Li | Yuyang Ye | Wanyu Wang | Pengyue Jia | Kai Zhang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Jingyu Peng | Maolin Wang | Nan Wang | Jiatong Li | Yuchen Li | Yuyang Ye | Wanyu Wang | Pengyue Jia | Kai Zhang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Despite substantial advancements in aligning LLMs with human values, current safety mechanisms remain susceptible to jailbreak attacks. We attribute this vulnerability to the distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, and drawing inspiration from logic-driven NLP tasks, we introduce LogiBreak, a universal black-box jailbreak method that utilizes logical expression translation to bypass LLM safety mechanisms. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-expressed inputs, preserving the underlying semantic intent and readability while evading safety constraints. Furthermore, to fill the gap of existing benchmarks that lack systematic resources specifically targeting logical expression-based attacks against LLM robustness, we construct a novel multilingual logical expression jailbreak dataset for evaluation. Our evaluations of LogiBreak in five languages demonstrate its effectiveness and generalizability in various linguistic contexts. The code is available at https://github.com/Applied-Machine-Learning-Lab/ACL2026_Logibreak.
SEARCH-R: Structured Entity-Aware Retrieval with Chain-of-Reasoning Navigator for Multi-hop Question Answering
FU Yuqing | Yimin Deng | Wanyu Wang | Yuhao Wang | Yejing Wang | Hongshi Liu | Yiqi Wang | Xiao Han | Maolin Wang | Guoshuai Zhao | Yi Chang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2026
FU Yuqing | Yimin Deng | Wanyu Wang | Yuhao Wang | Yejing Wang | Hongshi Liu | Yiqi Wang | Xiao Han | Maolin Wang | Guoshuai Zhao | Yi Chang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Multi-hop Question Answering (MHQA) aims to answer questions that require multi-step reasoning. The complexity of user queries, coupled with potential knowledge deficiencies in Large Language Models (LLMs), gives rise to two pivotal challenges that underpin the performance on this task: the correct identification of the reasoning path and the accurate retrieval of essential knowledge. Existing approaches primarily rely on prompt-based methods to generate reasoning paths, which are further combined with traditional sparse or dense retrieval to produce the final answer. However, the generation of reasoning paths commonly lacks effective control over the generative process, thus leading the reasoning astray. Meanwhile, the retrieval methods over-rely on knowledge matching or similarity scores rather than evaluating the practical utility of the information, resulting in retrieving homogeneous or non-useful information. Therefore, we propose a Structured Entity-Aware Retrieval with Chain-of-Reasoning Navigator framework named SEARCH-R. Specifically, SEARCH-R trains an end-to-end reasoning path navigator, which is able to provide a powerful sub-question decomposer by fine-tuning the Llama3.1-8B model. Moreover, a novel dependency tree-based retrieval is designed to evaluate the informational contribution of the document quantitatively. Extensive experiments on three challenging multi-hop datasets validate the effectiveness of the proposed framework. The code and dataset are available at: https://github.com/Applied-Machine-Learning-Lab/ACL2026_SEARCH-R.
BalanceSFT: Improving LLM Function Calling with Balanced Training Signals and Data Hardness
Bingguang Hao | Zengzhuang Xu | Maolin Wang | Yuntao Wen | Yicheng Chen | Cunyin Peng | Long Chen | Xiangyu Zhao | Jinjie Gu | Chenyi Zhuang | Ji Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Bingguang Hao | Zengzhuang Xu | Maolin Wang | Yuntao Wen | Yicheng Chen | Cunyin Peng | Long Chen | Xiangyu Zhao | Jinjie Gu | Chenyi Zhuang | Ji Zhang
Findings of the Association for Computational Linguistics: ACL 2026
While Supervised Fine-Tuning (SFT) is the prevailing method for equipping Large Language Models (LLMs) with function calling capabilities, its effectiveness is often compromised by two critical challenges: 1) **Imbalanced Training Signals**, where lengthy Chain-of-Thought (CoT) reasoning tokens dominate the training signals over concise function calls in the learning objective, and 2) **Imbalanced Data Hardness**, characterized by a scarcity of hard training examples. To overcome these limitations, we propose Balanced Supervised Fine-tuning (**BalanceSFT**), a novel framework that incorporates two key components: a Self-adjusted Signal Balancing (SSB) loss that employs a learnable hyperparameter to dynamically adjust the token contributions of CoT reasoning and function calls, together with a Hard Data Re-sampling (HDR) strategy that establishes a feedback loop to selectively generate new, high-quality complex data guided by model errors. Extensive experiments demonstrate the effectiveness of our proposed BalanceSFT framework. With BalanceSFT, a 7B model achieves function calling performance that surpasses state-of-the-art models like GPT-5. Our code, models, and dataset are open-sourced.
MTA:A Merge-then-Adapt Framework for Personalized Large Language Models
Xiaopeng Li | Yuanjin Zheng | Wanyu Wang | Wenlin Zhang | Pengyue Jia | Yingyi Zhang | Haiying He | Mengyang Ma | Yiqi Wang | Maolin Wang | Xuetao Wei | Xiangyu Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaopeng Li | Yuanjin Zheng | Wanyu Wang | Wenlin Zhang | Pengyue Jia | Yingyi Zhang | Haiying He | Mengyang Ma | Yiqi Wang | Maolin Wang | Xuetao Wei | Xiangyu Zhao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Personalized Large Language Models (PLLMs) aim to align model outputs with individual user preferences, a crucial capability for user-centric applications. However, the prevalent approach of fine-tuning a separate module for each user faces two major limitations: (1) storage costs scale linearly with the number of users, rendering the method unscalable; and (2) fine-tuning a static model from scratch often yields suboptimal performance for users with sparse data. To address these challenges, we propose MTA, a Merge-then-Adapt framework for PLLMs. MTA comprises three key stages. First, we construct a shared Meta-LoRA Bank by selecting anchor users and pre-training meta-personalization traits within meta-LoRA modules. Second, to ensure scalability and enable dynamic personalization combination beyond static models, we introduce an Adaptive LoRA Fusion stage. This stage retrieves and dynamically merges the most relevant anchor meta-LoRAs to synthesize a user-specific one, thereby eliminating the need for user-specific storage and supporting more flexible personalization. Third, we propose a LoRA Stacking for Few-Shot Personalization stage, which applies an additional ultra-low-rank, lightweight LoRA module on top of the merged LoRA. Fine-tuning this module enables effective personalization under few-shot settings. Extensive experiments on the LaMP benchmark demonstrate that our approach outperforms existing SOTA methods across multiple tasks. Our code is also available.
2025
Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation
Pengyue Jia | Derong Xu | Xiaopeng Li | Zhaocheng Du | Xiangyang Li | Yichao Wang | Yuhao Wang | Qidong Liu | Maolin Wang | Huifeng Guo | Ruiming Tang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2025
Pengyue Jia | Derong Xu | Xiaopeng Li | Zhaocheng Du | Xiangyang Li | Yichao Wang | Yuhao Wang | Qidong Liu | Maolin Wang | Huifeng Guo | Ruiming Tang | Xiangyu Zhao
Findings of the Association for Computational Linguistics: ACL 2025
The reranker and generator are two critical components in the Retrieval-Augmented Generation (i.e., RAG) pipeline, responsible for ranking relevant documents and generating responses. However, due to differences in pre-training data and objectives, there is an inevitable gap between the documents ranked as relevant by the reranker and those required by the generator to support answering the query. To address this gap, we propose RADIO, a novel and practical preference alignment framework with RAtionale DIstillatiOn. Specifically, We first propose a rationale extraction method that leverages the reasoning capabilities of large language models (LLMs) to extract the rationales necessary for answering the query. Subsequently, a rationale-based alignment process is designed to rerank the documents based on the extracted rationales, and fine-tune the reranker to align the preferences. We conduct extensive experiments on two tasks across three datasets to demonstrate the effectiveness of our approach compared to baseline methods. Our code is released online to ease reproduction.
Stepwise Reasoning Disruption Attack of LLMs
Jingyu Peng | Maolin Wang | Xiangyu Zhao | Kai Zhang | Wanyu Wang | Pengyue Jia | Qidong Liu | Ruocheng Guo | Qi Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingyu Peng | Maolin Wang | Xiangyu Zhao | Kai Zhang | Wanyu Wang | Pengyue Jia | Qidong Liu | Ruocheng Guo | Qi Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have made remarkable strides in complex reasoning tasks, but their safety and robustness in reasoning processes remain unexplored, particularly in third-party platforms that facilitate user interactions via APIs. Existing attacks on LLM reasoning are constrained by specific settings or lack of imperceptibility, limiting their feasibility and generalizability. To address these challenges, we propose the Stepwise rEasoning Error Disruption (SEED) attack, which subtly injects errors into prior reasoning steps to mislead the model into producing incorrect subsequent reasoning and final answers. Unlike previous methods, SEED is compatible with zero-shot and few-shot settings, maintains the natural reasoning flow, and ensures covert execution without modifying the instruction. Extensive experiments on four datasets across four different models demonstrate SEED’s effectiveness, revealing the vulnerabilities of LLMs to disruptions in reasoning processes. These findings underscore the need for greater attention to the robustness of LLM reasoning to ensure safety in practical applications. Our code is available at: https://github.com/Applied-Machine-Learning-Lab/SEED-Attack
2015
Search
Fix author
Co-authors
- Xiangyu Zhao 7
- Pengyue Jia 5
- Wanyu Wang 4
- Xiaopeng Li 2
- Qidong Liu 2
- Jingyu Peng 2
- Yichao Wang 2
- Yuhao Wang 2
- Yiqi Wang 2
- Yingyi Zhang 2
- Wenlin Zhang 2
- Kai Zhang 2
- Yi Chang 1
- Yicheng Chen 1
- Long Chen 1
- Yimin Deng 1
- Zhaocheng Du 1
- Jinjie Gu 1
- Huifeng Guo 1
- Ruocheng Guo 1
- Xiao Han 1
- Bingguang Hao 1
- Haiying He 1
- Mingxuan Huang 1
- Junyi Li 1
- Jiatong Li 1
- Yuchen Li 1
- Xiangyang Li 1
- Yong Liu 1
- Hongshi Liu 1
- Qi Liu 1
- Mengyang Ma 1
- Shervin Malmasi 1
- Cunyin Peng 1
- Xiaowei Qian 1
- Ruiming Tang 1
- Nan Wang 1
- Yejing Wang 1
- Xuetao Wei 1
- Yuntao Wen 1
- Zengzhuang Xu 1
- Derong Xu 1
- Yuyang Ye 1
- FU Yuqing 1
- Sheng Zhang 1
- Ji Zhang 1
- Guoshuai Zhao 1
- Yuanjin Zheng 1
- Chenyi Zhuang 1