Ante Wang
2026
UR2 : Unify RAG and Reasoning through Reinforcement Learning
Weitao Li | Boran Xiang | Xiaolong Wang | Jingyi Ren | Ante Wang | Zhinan Gou | Weizhi Ma | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weitao Li | Boran Xiang | Xiaolong Wang | Jingyi Ren | Ante Wang | Zhinan Gou | Weizhi Ma | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have shown strong capabilities through two complementary paradigms: Retrieval-Augmented Generation (RAG) for knowledge grounding and Reinforcement Learning from Verifiable Rewards (RLVR) for complex reasoning. However, existing attempts to unify these paradigms remain narrow in scope, typically limited to open-domain QA with fixed retrieval settings, which constrains generalization to broader domains. To address this limitation, we propose **UR2** (**U**nified **R**AG and **R**easoning), a general reinforcement learning framework that dynamically coordinates retrieval and reasoning. UR2 introduces two key designs: a difficulty-aware curriculum that selectively invokes retrieval only for challenging instances, and a hybrid knowledge access strategy that combines domain-specific offline corpora with on-the-fly LLM-generated summaries. Together, these components mitigate the imbalance between retrieval and reasoning and improve robustness to noisy information. Experiments on open-domain QA, MMLU-Pro, medical, and mathematical reasoning tasks show that UR2, built on Qwen-2.5-3/7B and LLaMA-3.1-8B, consistently outperforms existing RAG and RL baselines, and achieves performance comparable to GPT-4o-mini and GPT-4.1-mini on several benchmarks. We will release all code, models, and data.
Beyond "I Don’t Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty
Jingyi Ren | Ante Wang | Yunghwei Lai | Xiaolong Wang | Linlu Gong | Weitao Li | Weizhi Ma | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingyi Ren | Ante Wang | Yunghwei Lai | Xiaolong Wang | Linlu Gong | Weitao Li | Weizhi Ma | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reliable Large Language Models (LLMs) should abstain when confidence is insufficient. However, prior studies often treat refusal as a generic "I don’t know”, failing to distinguish input-level ambiguity (data uncertainty) from capability limitations (model uncertainty). This lack of distinction limits downstream action decisions like requesting clarification or invoking external tools.In this work, we introduce UA-Bench, a benchmark of over 3,500 questions drawn from six datasets spanning knowledge-intensive and reasoning-intensive tasks, designed to evaluate explicit uncertainty attribution.An evaluation of 18 frontier LLMs shows that even state-of-the-art models struggle to reliably discriminate between data uncertainty and model uncertainty, and that high answer accuracy does not necessarily imply strong uncertainty attribution ability.To narrow this gap, we propose a lightweight data synthesis and reinforcement learning strategy. Experiments on both Qwen3-4B-Instruct-2507 and Qwen3-8B in thinking mode show that the proposed method improves uncertainty attribution while preserving answer accuracy.Our code and data are publicly available now.
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
Zhanyu Liu | Qingguo Hu | Ante Wang | Chenqing Liu | Zhishang Xiang | Hui Li | Delai Qiu | Jinsong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhanyu Liu | Qingguo Hu | Ante Wang | Chenqing Liu | Zhishang Xiang | Hui Li | Delai Qiu | Jinsong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning with Verifiable Reward (RLVR) has proven effective for training reasoning-oriented large language models, but existing methods largely assume high-resource settings with abundant training data. In low-resource scenarios, RLVR is prone to more severe entropy collapse, which substantially limits exploration and degrades reasoning performance. To address this issue, we propose **H**ybrid-domain **E**ntropy dynamics **AL**ignment (HEAL), a framework tailored for few-shot RLVR. HEAL first selectively incorporates high-value general-domain data to promote more diverse exploration. Then, we introduce Entropy Dynamics Alignment (EDA), a reward mechanism that aligns trajectory-level entropy dynamics between the target and general domains, capturing both entropy magnitude and fine-grained variation. Through this alignment, EDA not only further mitigates entropy collapse but also encourages the policy to acquire more diverse exploration behaviors from the general domain. Experiments across multiple domains show that HEAL consistently improves few-shot RLVR performance. Notably, using only 32 target-domain samples, HEAL matches or even surpasses full-shot RLVR trained with 1K target-domain samples.
2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin | Ante Wang | Moye Chen | Jingyao Liu | Hao Liu | Jinsong Su | Xinyan Xiao
Findings of the Association for Computational Linguistics: ACL 2025
Yujie Lin | Ante Wang | Moye Chen | Jingyao Liu | Hao Liu | Jinsong Su | Xinyan Xiao
Findings of the Association for Computational Linguistics: ACL 2025
Recently, inference-time scaling of chain-of-thought (CoT) has been demonstrated as a promising approach for addressing multi-modal reasoning tasks.While existing studies have predominantly centered on text-based thinking, the integration of both visual and textual modalities within the reasoning process remains unexplored.In this study, we pioneer the exploration of inference-time scaling with multi-modal thought, aiming to bridge this gap.To provide a comprehensive analysis, we systematically investigate popular sampling-based and tree search-based inference-time scaling methods on 10 challenging tasks spanning various domains.Besides, we uniformly adopt a consistency-enhanced verifier to ensure effective guidance for both methods across different thought paradigms.Results show that multi-modal thought promotes better performance against conventional text-only thought, and blending the two types of thought fosters more diverse thinking.Despite these advantages, multi-modal thoughts necessitate higher token consumption for processing richer visual inputs, which raises concerns in practical applications.We hope that our findings on the merits and drawbacks of this research line will inspire future works in the field. The code will be released upon acceptance.
A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection
Hui Li | Ante Wang | Kunquan Li | Zhihao Wang | Liang Zhang | Delai Qiu | Qingsong Liu | Jinsong Su
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Hui Li | Ante Wang | Kunquan Li | Zhihao Wang | Liang Zhang | Delai Qiu | Qingsong Liu | Jinsong Su
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Misinformation spans various domains, but detection methods trained on specific domains often perform poorly when applied to others. With the rapid development of Large Language Models (LLMs), researchers have begun to utilize LLMs for cross-domain misinformation detection. However, existing LLM-based methods often fail to adequately analyze news in the target domain, limiting their detection capabilities. More importantly, these methods typically rely on manually designed decision rules, which are limited by domain knowledge and expert experience, thus limiting the generalizability of decision rules to different domains. To address these issues, we propose a Multi-Agent Framework for cross-domain misinformation detection with Automated Decision Rule Optimization (MARO). Under this framework, we first employs multiple expert agents to analyze target-domain news. Subsequently, we introduce a question-reflection mechanism that guides expert agents to facilitate higher-quality analysis. Furthermore, we propose a decision rule optimization approach based on carefully designed cross-domain validation tasks to iteratively enhance decision rule effectiveness across domains. Experimental results and analysis on commonly used datasets demonstrate that MARO achieves significant improvements over existing methods.
Don’t Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
Ante Wang | Linfeng Song | Ye Tian | Dian Yu | Haitao Mi | Xiangyu Duan | Zhaopeng Tu | Jinsong Su | Dong Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ante Wang | Linfeng Song | Ye Tian | Dian Yu | Haitao Mi | Xiangyu Duan | Zhaopeng Tu | Jinsong Su | Dong Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs), but at the cost of increased computational resources. In this work, we identify two key challenges contributing to this inefficiency: over-exploration due to redundant states with semantically equivalent content, and under-exploration caused by high variance in verifier scoring leading to frequent trajectory switching. To address these issues, we propose FETCH – an e ffici ent tree sear ch framework, which is a flexible, plug-and-play system compatible with various tree search algorithms.Our framework mitigates over-exploration by merging semantically similar states using agglomerative clustering of text embeddings obtained from a fine-tuned SimCSE model. To tackle under-exploration, we enhance verifiers by incorporating temporal difference learning with adjusted 𝜆-returns during training to reduce variance, and employing a verifier ensemble to aggregate scores during inference. Experiments on GSM8K, GSM-Plus, and MATH datasets demonstrate that our methods significantly improve reasoning accuracy and computational efficiency across four different tree search algorithms, paving the way for more practical applications of LLM-based reasoning. The code is available at https://github.com/DeepLearnXMU/Fetch.
2024
EmoTrans: Emotional Transition-based Model for Emotion Recognition in Conversation
Zhongquan Jian | Ante Wang | Jinsong Su | Junfeng Yao | Meihong Wang | Qingqiang Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Zhongquan Jian | Ante Wang | Jinsong Su | Junfeng Yao | Meihong Wang | Qingqiang Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In an emotional conversation, emotions are causally transmitted among communication participants, constituting a fundamental conversational feature that can facilitate the comprehension of intricate changes in emotional states during the conversation and contribute to neutralizing emotional semantic bias in utterance caused by the absence of modality information. Therefore, emotional transition (ET) plays a crucial role in the task of Emotion Recognition in Conversation (ERC) that has not received sufficient attention in current research. In light of this, an Emotional Transition-based Emotion Recognizer (EmoTrans) is proposed in this paper. Specifically, we concatenate the most recent utterances with their corresponding speakers to construct the model input, known as samples, each with several placeholders to implicitly express the emotions of contextual utterances. Based on these placeholders, two components are developed to make the model sensitive to emotions and effectively capture the ET features in the sample. Furthermore, an ET-based Contrastive Learning (CL) is developed to compact the representation space, making the model achieve more robust sample representations. We conducted exhaustive experiments on four widely used datasets and obtained competitive experimental results, especially, new state-of-the-art results obtained on MELD and IEMOCAP, demonstrating the superiority of EmoTrans.
Self-Consistency Boosts Calibration for Math Reasoning
Ante Wang | Linfeng Song | Ye Tian | Baolin Peng | Lifeng Jin | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: EMNLP 2024
Ante Wang | Linfeng Song | Ye Tian | Baolin Peng | Lifeng Jin | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: EMNLP 2024
Improving LLM Generations via Fine-Grained Self-Endorsement
Ante Wang | Linfeng Song | Baolin Peng | Lifeng Jin | Ye Tian | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: ACL 2024
Ante Wang | Linfeng Song | Baolin Peng | Lifeng Jin | Ye Tian | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: ACL 2024
This work studies mitigating fact-conflicting hallucinations for large language model (LLM) at inference time.Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses.Compared with prior ensemble methods (e.g., self-consistency) that perform response-level selection, our approach can better alleviate hallucinations for knowledge-intensive tasks.Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons.Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs.Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
Jianheng Huang | Leyang Cui | Ante Wang | Chengyi Yang | Xinting Liao | Linfeng Song | Junfeng Yao | Jinsong Su
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jianheng Huang | Leyang Cui | Ante Wang | Chengyi Yang | Xinting Liao | Linfeng Song | Junfeng Yao | Jinsong Su
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model’s ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To address this challenge, we propose a framework called Self-Synthesized Rehearsal (SSR) that uses the LLM to generate synthetic instances for rehearsal. Concretely, we first employ the base LLM for in-context learning to generate synthetic instances. Subsequently, we utilize the latest LLM to refine the instance outputs based on the synthetic inputs, preserving its acquired ability. Finally, we select diverse high-quality synthetic instances for rehearsal in future stages. Experimental results demonstrate that SSR achieves superior or comparable performance compared to conventional rehearsal-based approaches while being more data-efficient. Besides, SSR effectively preserves the generalization capabilities of LLMs in general domains.
2023
OpenFact: Factuality Enhanced Open Knowledge Extraction
Linfeng Song | Ante Wang | Xiaoman Pan | Hongming Zhang | Dian Yu | Lifeng Jin | Haitao Mi | Jinsong Su | Yue Zhang | Dong Yu
Transactions of the Association for Computational Linguistics, Volume 11
Linfeng Song | Ante Wang | Xiaoman Pan | Hongming Zhang | Dian Yu | Lifeng Jin | Haitao Mi | Jinsong Su | Yue Zhang | Dong Yu
Transactions of the Association for Computational Linguistics, Volume 11
We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.
Domain Adaptation for Conversational Query Production with the RAG Model Feedback
Ante Wang | Linfeng Song | Ge Xu | Jinsong Su
Findings of the Association for Computational Linguistics: EMNLP 2023
Ante Wang | Linfeng Song | Ge Xu | Jinsong Su
Findings of the Association for Computational Linguistics: EMNLP 2023
Conversational query production is an emerging fundamental task for the dialogue system, where search queries are generated to explore the vast and continually updating knowledge from a search engine. To accelerate this line of research, previous studies have released several datasets with human-annotated search queries. However, the limited annotations still can not cover conversations of various domains. To solve this challenge, we propose a novel domain adaptation framework. It is inspired by a weakly supervised learning algorithm from previous work that guides a model using reinforcement learning with BM25 scores as feedback. Though effective, it is fragile facing noisy content on webpages from a commercial search engine and variance in conversations because of ignoring deep semantic information of dialogue contexts. Thus, we improve the algorithm by taking the advance of retrieval-augmented generation (RAG) and exploring several practical techniques such as knowledge distillation for stable training. We conduct experiments in multiple settings across different languages. Guided by the RAG model feedback, our model is more robust and performs significantly better especially in a more challenging setting over strong baselines.
2021
Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings
Shaopeng Lai | Ante Wang | Fandong Meng | Jie Zhou | Yubin Ge | Jiali Zeng | Junfeng Yao | Degen Huang | Jinsong Su
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Shaopeng Lai | Ante Wang | Fandong Meng | Jie Zhou | Yubin Ge | Jiali Zeng | Junfeng Yao | Degen Huang | Jinsong Su
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Dominant sentence ordering models can be classified into pairwise ordering models and set-to-sequence models. However, there is little attempt to combine these two types of models, which inituitively possess complementary advantages. In this paper, we propose a novel sentence ordering framework which introduces two classifiers to make better use of pairwise orderings for graph-based sentence ordering (Yin et al. 2019, 2021). Specially, given an initial sentence-entity graph, we first introduce a graph-based classifier to predict pairwise orderings between linked sentences. Then, in an iterative manner, based on the graph updated by previously predicted high-confident pairwise orderings, another classifier is used to predict the remaining uncertain pairwise orderings. At last, we adapt a GRN-based sentence ordering model (Yin et al. 2019, 2021) on the basis of final graph. Experiments on five commonly-used datasets demonstrate the effectiveness and generality of our model. Particularly, when equipped with BERT (Devlin et al. 2019) and FHDecoder (Yin et al. 2020), our model achieves state-of-the-art performance. Our code is available at https://github.com/DeepLearnXMU/IRSEG.
BACO: A Background Knowledge- and Content-Based Framework for Citing Sentence Generation
Yubin Ge | Ly Dinh | Xiaofeng Liu | Jinsong Su | Ziyao Lu | Ante Wang | Jana Diesner
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Yubin Ge | Ly Dinh | Xiaofeng Liu | Jinsong Su | Ziyao Lu | Ante Wang | Jana Diesner
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
In this paper, we focus on the problem of citing sentence generation, which entails generating a short text to capture the salient information in a cited paper and the connection between the citing and cited paper. We present BACO, a BAckground knowledge- and COntent-based framework for citing sentence generation, which considers two types of information: (1) background knowledge by leveraging structural information from a citation network; and (2) content, which represents in-depth information about what to cite and why to cite. First, a citation network is encoded to provide background knowledge. Second, we apply salience estimation to identify what to cite by estimating the importance of sentences in the cited paper. During the decoding stage, both types of information are combined to facilitate the text generation, and then we conduct a joint training for the generator and citation function classification to make the model aware of why to cite. Our experimental results show that our framework outperforms comparative baselines.
2020
Structural Information Preserving for Graph-to-Text Generation
Linfeng Song | Ante Wang | Jinsong Su | Yue Zhang | Kun Xu | Yubin Ge | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Linfeng Song | Ante Wang | Jinsong Su | Yue Zhang | Kun Xu | Yubin Ge | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. As a crucial defect, the current state-of-the-art models may mess up or even drop the core structural information of input graphs when generating outputs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. In particular, we introduce two types of autoencoding losses, each individually focusing on different aspects (a.k.a. views) of input graphs. The losses are then back-propagated to better calibrate our model via multi-task training. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
Search
Fix author
Co-authors
- Jinsong Su 13
- Linfeng Song 7
- Dong Yu (于东) 5
- Haitao Mi 4
- Yubin Ge 3
- Lifeng Jin 3
- Ye Tian 3
- Junfeng Yao 3
- Hui Li 2
- Weitao Li 2
- Yang Liu 2
- Weizhi Ma 2
- Baolin Peng 2
- Delai Qiu 2
- Jingyi Ren 2
- Xiaolong Wang 2
- Dian Yu 2
- Yue Zhang 2
- Moye Chen 1
- Leyang Cui 1
- Jana Diesner 1
- Ly Dinh 1
- Xiangyu Duan 1
- Linlu Gong 1
- Zhinan Gou 1
- Qingguo Hu 1
- Degen Huang 1
- Jianheng Huang 1
- Zhongquan Jian 1
- Shaopeng Lai 1
- Yunghwei Lai 1
- Kunquan Li 1
- Xinting Liao 1
- Yujie Lin 1
- Chenqing Liu 1
- Hao Liu 1
- Jingyao Liu 1
- Qingsong Liu 1
- Xiaofeng Liu 1
- Zhanyu Liu 1
- Ziyao Lu 1
- Fandong Meng 1
- Xiaoman Pan 1
- Zhaopeng Tu 1
- Meihong Wang 1
- Zhihao Wang 1
- Qingqiang Wu 1
- Boran Xiang 1
- Zhishang Xiang 1
- Xinyan Xiao 1
- Ge Xu 1
- Kun Xu 1
- Chengyi Yang 1
- Jiali Zeng 1
- Hongming Zhang 1
- Liang Zhang 1
- Jie Zhou 1