Fangchao Liu

2025

pdf bib abs
P3: Prompts Promote Prompting
Xinyu Zhang | Yuanquan Hu | Fangchao Liu | Zhicheng Dou
Findings of the Association for Computational Linguistics: ACL 2025

Current large language model (LLM) applications often employ multi-component prompts, comprising both system and user prompts, to guide model behaviors. While recent advancements have demonstrated the efficacy of automatically optimizing either the system or user prompt to boost performance, such unilateral approaches often yield suboptimal outcomes due to the interdependent nature of these components. In this work, we introduce P3, a novel self-improvement framework that concurrently optimizes both system and user prompts through an iterative process. The offline optimized prompts are further leveraged to promote online prompting by performing query-dependent prompt optimization. Extensive experiments on general tasks (e.g., Arena-hard and Alpaca-eval) and reasoning tasks (e.g., GSM8K and GPQA) demonstrate that P3 achieves superior performance in the realm of automatic prompt optimization. Our results highlight the effectiveness of a holistic optimization strategy in enhancing LLM performance across diverse domains.

In-context learning (ICL) performance heavily relies on the quality and ordering of demonstrations. Iterative selection (IS) is a promising approach to address this issue, but existing IS methods face two key challenges: the oversimplification of process reward signals that guide intermediate steps (often using single-dimensional metrics) and the lack of outcome reward signals that directly optimize final-task accuracy (relying solely on binary terminal feedback like correct/incorrect predictions). To address these issues, we propose a reinforcement learning method R-Mix which models iterative demonstration selection as a Markov Decision Process (MDP), crafting hybrid reward signals — combining outcome-based accuracy signals (i.e., outcome rewards) with process-oriented signals (i.e, process rewards) like stepwise influence and label entropy improvement. Our analysis reveals a positive but trade-off relationship between outcome rewards and process rewards, underscoring the importance of both components for effective policy optimization. We further introduce a dual-head policy architecture that explicitly decouples input-semantic relevance and label-content compatibility. Experiments across NLP benchmarks demonstrate superior performance over state-of-the-art methods, with ablation studies validating the necessity of both reward components and architectural disentanglement. Our work has deeply explored the effective potential of ICL through demonstration selection.

In-Context Learning (ICL) is an essential emergent ability of Large Language Models (LLMs), and recent studies introduce CoT to exemplars of ICL to enhance the reasoning capability, especially in mathematics tasks. However, given the continuous advancement of model capabilities, it remains unclear whether CoT exemplars still benefit recent, stronger models in such tasks. Through systematic experiments, we find that for recent strong models such as the Qwen2.5 series, adding traditional CoT exemplars does not improve reasoning performance compared to Zero-Shot CoT. Instead, their primary function is to align the output format with human expectations. We further investigate the effectiveness of enhanced CoT exemplars, constructed using answers from advanced models such as Qwen2.5-Max and DeepSeek-R1. Experimental results indicate that these enhanced exemplars still fail to improve the model’s reasoning performance. Further analysis reveals that models tend to ignore the exemplars and focus primarily on the instructions, leading to no observable gain in reasoning ability. Overall, our findings highlight the limitations of the current ICL+CoT framework in mathematical reasoning, calling for a re-examination of the ICL paradigm and the definition of exemplars.

2022

pdf bib abs
Pre-training to Match for Unified Low-shot Relation Extraction
Fangchao Liu | Hongyu Lin | Xianpei Han | Boxi Cao | Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Low-shot relation extraction (RE) aims to recognize novel relations with very few or even no samples, which is critical in real scenario application. Few-shot and zero-shot RE are two representative low-shot RE tasks, which seem to be with similar target but require totally different underlying abilities. In this paper, we propose Multi-Choice Matching Networks to unify low-shot relation extraction. To fill in the gap between zero-shot and few-shot RE, we propose the triplet-paraphrase meta-training, which leverages triplet paraphrase to pre-train zero-shot label matching ability and uses meta-learning paradigm to learn few-shot instance summarizing ability. Experimental results on three different low-shot RE tasks show that the proposed method outperforms strong baselines by a large margin, and achieve the best performance on few-shot RE leaderboard.

pdf bib abs
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
Boxi Cao | Hongyu Lin | Xianpei Han | Fangchao Liu | Le Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in real-world applications. To discover, understand and quantify the risks, this paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention. This paper provides valuable insights for the design of unbiased datasets, better probing frameworks and more reliable evaluations of pretrained language models. Furthermore, our conclusions also echo that we need to rethink the criteria for identifying better pretrained language models.

2021

pdf bib abs
Element Intervention for Open Relation Extraction
Fangchao Liu | Lingyong Yan | Hongyu Lin | Xianpei Han | Le Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Open relation extraction aims to cluster relation instances referring to the same underlying relation, which is a critical step for general relation extraction. Current OpenRE models are commonly trained on the datasets generated from distant supervision, which often results in instability and makes the model easily collapsed. In this paper, we revisit the procedure of OpenRE from a causal view. By formulating OpenRE using a structural causal model, we identify that the above-mentioned problems stem from the spurious correlations from entities and context to the relation type. To address this issue, we conduct Element Intervention, which intervene on the context and entities respectively to obtain the underlying causal effects of them. We also provide two specific implementations of the interventions based on entity ranking and context contrasting. Experimental results on unsupervised relation extraction datasets show our method to outperform previous state-of-the-art methods and is robust across different datasets.