Yufei He
2026
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui | Yufei He | Tri Cao | Sophia Simeng Han | Yulin Chen | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Yuan Sui | Yufei He | Tri Cao | Sophia Simeng Han | Yulin Chen | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to “think about how to think”. It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM’s reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Yulin Chen | Haoran Li | Yuan Sui | Yue Liu | Yufei He | Xiaoling Bai | Chi Fei | Li Yabo | Haozhe Ma | Yangqiu Song | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Yulin Chen | Haoran Li | Yuan Sui | Yue Liu | Yufei He | Xiaoling Bai | Chi Fei | Li Yabo | Haozhe Ma | Yangqiu Song | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Prompt injection attacks manipulate large language models (LLMs) by misleading them to deviate from the original input instructions and execute maliciously injected instructions, because of their instruction-following capabilities and inability to distinguish between the original input instructions and maliciously injected instructions. Currently, various prompt injection defense methods have been proposed, including prompt-engineering-based approaches and fine-tuning methods. Most of these methods instruct the model to follow the original input instructions, suppressing its inherent tendencies to follow the injected instructions. However, experimental results reveal that suppressing the model’s instruction-following tendencies is challenging. After analyzing successful attack cases, we find that the LLMs can correctly reference the instructions they are executing in some cases. Motivated by this finding, we propose a defense method that leverages LLMs’ instruction-following abilities rather than suppressing them. Our approach prompts LLMs to generate responses that include both the answers and their corresponding instruction references. Based on these references, we filter out answers whose references are not to the original input instructions. We conduct comprehensive experiments to evaluate the effectiveness of our proposed method. The results show that our approach outperforms prompt-engineering-based baselines and is comparable to fine-tuning methods, reducing the ASR to nearly 0% in some scenarios. Moreover, our approach has minimal impact on overall utility.
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Zhiyuan Hu | Yucheng Wang | Yufei He | Jiaying Wu | Yilun Zhao | See-Kiong Ng | Cynthia Breazeal | Anh Tuan Luu | Hae Won Park | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyuan Hu | Yucheng Wang | Yufei He | Jiaying Wu | Yilun Zhao | See-Kiong Ng | Cynthia Breazeal | Anh Tuan Luu | Hae Won Park | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: policies prematurely concentrate on a small set of dominant reasoning patterns, improving pass@1 while limiting rollout-level diversity and gains in pass@k. We argue that this failure stems from regularizing local token behavior rather than diversity over sets of solutions. To address this, we propose Uniqueness-Aware Reinforcement Learning, a rollout-level objective that explicitly rewards correct solutions that exhibit rare high-level strategies. Our method uses an LLM-based judge to cluster rollouts for the same problem according to their high-level solution strategies, ignoring superficial variations, and reweights policy advantages inversely with cluster size. As a result, correct but novel strategies receive higher rewards than redundant ones. Across mathematics, physics, and medical reasoning benchmarks, our approach consistently improves pass@k across large sampling budgets and increases the area under the pass@k curve (AUC@K) without sacrificing pass@1, while sustaining exploration and uncovering more diverse solution strategies at scale. Code is in Software part under submission page.
Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation
Nuo Chen | Yicheng Tong | Yuzhe Yang | Yufei He | Xueyi Zhang | Zou Qingyun | Qian Wang | Bingsheng He
Findings of the Association for Computational Linguistics: ACL 2026
Nuo Chen | Yicheng Tong | Yuzhe Yang | Yufei He | Xueyi Zhang | Zou Qingyun | Qian Wang | Bingsheng He
Findings of the Association for Computational Linguistics: ACL 2026
Multi-agent systems (MAS) are increasingly used for open-ended idea generation, driven by the expectation that collective interaction will broaden the exploration diversity. However, when and why such collaboration truly expands the solution space remains unclear. We present a systematic empirical study of diversity in MAS-based ideation across three bottom-up levels: model intelligence, agent cognition, and system dynamics. At the model level, we identify a compute efficiency paradox, where stronger, highly aligned models yield diminishing marginal diversity despite higher per-sample quality. At the cognition level, authority-driven dynamics suppress semantic diversity compared to junior-dominated groups. At the system level, group-size scaling yields diminishing returns and dense communication topologies accelerate premature convergence. We characterize these outcomes as collective failures emerging from structural coupling, a process where interaction inadvertently contracts agent exploration and triggers diversity collapse. Our analysis shows that this collapse arises primarily from the interaction structure rather than inherent model insufficiency, highlighting the importance of preserving independence and disagreement when designing MAS for creative tasks. Our code is available at https://github.com/Xtra-Computing/MAS_Diversity.
2025
FiDeLiS: Faithful Reasoning in Large Language Models for Knowledge Graph Question Answering
Yuan Sui | Yufei He | Nian Liu | Xiaoxin He | Kun Wang | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2025
Yuan Sui | Yufei He | Nian Liu | Xiaoxin He | Kun Wang | Bryan Hooi
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) are often challenged by generating erroneous or hallucinated responses, especially in complex reasoning tasks. Leveraging Knowledge Graphs (KGs) as external knowledge sources has emerged as a viable solution. However, existing KG-enhanced methods, either retrieval-based or agent-based, encounter difficulties in accurately retrieving knowledge and efficiently traversing KGs at scale. In this paper, we propose a unified framework, FiDeLiS, designed to improve the factuality of LLM responses by anchoring answers to verifiable reasoning steps retrieved from KGs. To achieve this, we leverage step-wise beam search with a deductive scoring function, allowing the LLM to validate reasoning process step by step, and halt the search once the question is deducible. In addition, we propose a Path-RAG module to pre-select a smaller candidate set for each beam search step, reducing computational costs by narrowing the search space. Extensive experiments show that our method, as a training-free framework, not only improve the performance but also enhance the factuality and interpretability across different benchmarks.
Can Indirect Prompt Injection Attacks Be Detected and Removed?
Yulin Chen | Haoran Li | Yuan Sui | Yufei He | Yue Liu | Yangqiu Song | Bryan Hooi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yulin Chen | Haoran Li | Yuan Sui | Yufei He | Yue Liu | Yangqiu Song | Bryan Hooi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Prompt injection attacks manipulate large language models (LLMs) by misleading them to deviate from the original input instructions and execute maliciously injected instructions, because of their instruction-following capabilities and inability to distinguish between the original input instructions and maliciously injected instructions. To defend against such attacks, recent studies have developed various detection mechanisms. If we restrict ourselves specifically to works which perform detection rather than direct defense, most of them focus on direct prompt injection attacks, while there are few works for the indirect scenario, where injected instructions are indirectly from external tools, such as a search engine. Moreover, current works mainly investigate injection detection methods and pay less attention to the post-processing method that aims to mitigate the injection after detection.In this paper, we investigate the feasibility of detecting and removing indirect prompt injection attacks, and we construct a benchmark dataset for evaluation. For detection, we assess the performance of existing LLMs and open-source detection models, and we further train detection models using our crafted training datasets. For removal, we evaluate two intuitive methods: (1) the *segmentation removal method*, which segments the injected document and removes parts containing injected instructions, and (2) the *extraction removal method*, which trains an extraction model to identify and remove injected instructions.
Safety in Large Reasoning Models: A Survey
Cheng Wang | Yue Liu | Baolong Bi | Duzhen Zhang | Zhong-Zhi Li | Yingwei Ma | Yufei He | Shengju Yu | Xinfeng Li | Junfeng Fang | Jiaheng Zhang | Bryan Hooi
Findings of the Association for Computational Linguistics: EMNLP 2025
Cheng Wang | Yue Liu | Baolong Bi | Duzhen Zhang | Zhong-Zhi Li | Yingwei Ma | Yufei He | Shengju Yu | Xinfeng Li | Junfeng Fang | Jiaheng Zhang | Bryan Hooi
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents the first comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies specific to these powerful reasoning-enhanced models. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
Yufei He | Ruoyu Li | Alex Chen | Yue Liu | Yulin Chen | Yuan Sui | Cheng Chen | Yi Zhu | Luca Luo | Frank Yang | Bryan Hooi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Yufei He | Ruoyu Li | Alex Chen | Yue Liu | Yulin Chen | Yuan Sui | Cheng Chen | Yi Zhu | Luca Luo | Frank Yang | Bryan Hooi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language model (LLM) agents often struggle in environments where rules and required domain knowledge frequently change, such as regulatory compliance and user risk screening. To address this limitation, we propose the Adaptive Reflective Interactive Agent (ARIA), an LLM agent framework designed specifically to continuously learn updated domain knowledge at test time. ARIA assesses its own uncertainty through structured self-dialogue, proactively identifying knowledge gaps and requesting targeted explanations or corrections from human experts. It then systematically updates an internal, timestamped knowledge repository with provided human guidance, detecting and resolving conflicting or outdated knowledge through comparisons and clarification queries. We evaluate ARIA on the realistic customer due diligence name screening task on a global payment platform, alongside publicly available dynamic knowledge tasks. Results demonstrate significant improvements in adaptability and accuracy compared to baselines using standard offline fine-tuning and existing self-improving agents. ARIA has been deployed on a global payment platform serving over 150 million monthly active users.
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
Yuan Sui | Yufei He | Zifeng Ding | Bryan Hooi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuan Sui | Yufei He | Zifeng Ding | Bryan Hooi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent works integrating Knowledge Graphs (KGs) have shown promising improvements in enhancing the reasoning capabilities of Large Language Models (LLMs). However, existing benchmarks primarily focus on closed-ended tasks, leaving a gap in evaluating performance on more complex, real-world scenarios. This limitation also hinders a thorough assessment of KGs’ potential to reduce hallucinations in LLMs. To address this, we introduce OKGQA, a new benchmark specifically designed to evaluate LLMs augmented with KGs in open-ended, real-world question answering settings. OKGQA reflects practical complexities through diverse question types and incorporates metrics to quantify both hallucination rates and reasoning improvements in LLM+KG models. To consider the scenarios in which KGs may contain varying levels of errors, we propose a benchmark variant, OKGQA-P, to assess model performance when the semantics and structure of KGs are deliberately perturbed and contaminated. In this paper, we aims to (1) explore whether KGs can make LLMs more trustworthy in an open-ended setting, and (2) conduct a comparative analysis to shed light on method design. We believe this study can facilitate a more complete performance comparison and encourages continuous improvement in integrating KGs with LLMs to mitigate hallucination, and make LLMs more trustworthy.
Search
Fix author
Co-authors
- Bryan Hooi 8
- Yuan Sui 6
- Yulin Chen 4
- Yue Liu 4
- Haoran Li 2
- Yangqiu Song 2
- Xiaoling Bai 1
- Baolong Bi 1
- Cynthia Breazeal 1
- Tri Cao 1
- Alex Chen 1
- Cheng Chen 1
- Nuo Chen 1
- Zifeng Ding 1
- Junfeng Fang 1
- Chi Fei 1
- Sophia Simeng Han 1
- Xiaoxin He 1
- Bingsheng He 1
- Zhiyuan Hu 1
- Zhong-Zhi Li 1
- Xinfeng Li 1
- Ruoyu Li 1
- Nian Liu 1
- Luca Luo 1
- Yingwei MA 1
- Haozhe Ma 1
- See Kiong Ng 1
- Hae Won Park 1
- Zou Qingyun 1
- Yicheng Tong 1
- Luu Anh Tuan 1
- Kun Wang 1
- Cheng Wang 1
- Yucheng Wang 1
- Qian Wang 1
- Jiaying Wu 1
- Li Yabo 1
- Frank Yang 1
- Yuzhe Yang 1
- Shengju Yu 1
- Duzhen Zhang 1
- Jiaheng Zhang 1
- Xueyi Zhang 1
- Yilun Zhao 1
- Yi Zhu 1