Peng Yan
2026
TrendFact: A Benchmark Towards Hotspot Perception in Automatic Fact-Checking
Xiaocheng Zhang | Xi Wang | Yifei Lu | Jianing Wang | Zhuangzhuang Ye | Mengjiao Bao | Peng Yan | Xiaohong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiaocheng Zhang | Xi Wang | Yifei Lu | Jianing Wang | Zhuangzhuang Ye | Mengjiao Bao | Peng Yan | Xiaohong Su
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the surge of online misinformation, Large Language Models (LLMs) and Reasoning Large Language Models (RLMs) serving as Automatic Fact-Checking (AFC) systems have emerged as a prominent paradigm for reliable, explainable verification. However, our empirical study reveals that this paradigm faces a critical risk asymmetry challenge when deployed in real-world under resource-constrained environments. While Hotspot Perception Ability (HPA), the capacity to dynamically allocate reasoning resources based on social impact, is essential to mitigate this risk, existing benchmarks lack the social metadata and evaluation framework to meet this urgent evaluation needs, thereby hindering the advancement of these AFC systems. To bridge this gap, we introduce TrendFact, the first benchmark capable of evaluating HPA and three fact-checking tasks. It consists of 7,643 curated samples sourced from trending platforms and professional datasets, with an evidence library containing 366,634 entries. To enable HPA assessment, we propose two novel metrics: the Explanation Consistency Score (ECS) to evaluate the reliability of verification reasoning, and the Hotspot Claim Perception Index (HCPI) to quantify the overall HPA of AFC systems. Extensive experiments demonstrate that existing AFC systems exhibit limited performance on TrendFact. Furthermore, our proposed FactISR framework effectively enhances HPA and computational efficiency for RLM-driven systems.
2025
From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs
Shenshen Li | Wenxin Meng | Lei Wang | Hao Yang | Chong Peng | Peng Yan | Fumin Shen | Jingkuan Song | Heng Tao Shen | Xing Xu
Findings of the Association for Computational Linguistics: ACL 2025
Shenshen Li | Wenxin Meng | Lei Wang | Hao Yang | Chong Peng | Peng Yan | Fumin Shen | Jingkuan Song | Heng Tao Shen | Xing Xu
Findings of the Association for Computational Linguistics: ACL 2025
Recent progress in large vision-language models (LVLMs) has shown substantial potential across a broad spectrum of third-person tasks. However, adapting these LVLMs to egocentric scenarios remains challenging due to their third-person training bias. Existing methods that adapt LVLMs for first-person tasks often overlook critical agent-environment interactions, limiting their ability to perform egocentric reasoning. To address these challenges, we propose a novel zero-shot paradigm termed Front-Door Adjustments with Uncertainty Calibration (FRUIT) to enhance the egocentric reasoning abilities of LVLMs by simulating human causal reasoning. Specifically, the FRUIT operates in two stages: observation and understanding. Unlike conventional prompting techniques, we formalize egocentric reasoning using a structural causal model. Then, we ground interaction regions and expand them into hierarchical visual cues, augmented with corresponding captions, to form the initial observations. To reduce noise in these observations, we employ uncertainty calibration to filter out unreliable information. These refined observations as mediators are then incorporated into the prompt template, guiding the model to understand semantics from a first-person perspective. Extensive experiments conducted on the EgoThink benchmark demonstrate that our FRUIT method consistently enhances the performance of existing LVLMs on six distinct tasks. Our code is available at https://github.com/Mrshenshen/FRUIT.
Consistency-Aware Online Multi-Objective Alignment for Related Search Query Generation
Shuxian Bi | Chongming Gao | Wenjie Wang | Yueqi Mou | Chenxu Wang | Tang Biao | Peng Yan | Fuli Feng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Shuxian Bi | Chongming Gao | Wenjie Wang | Yueqi Mou | Chenxu Wang | Tang Biao | Peng Yan | Fuli Feng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Modern digital platforms rely on related search query recommendations to enhance engagement, yet existing methods fail to reconcile click-through rate (CTR) optimization with topic expansion. We propose **CMAQ**, a **C**onsistent **M**ulti-Objective **A**ligned **Q**uery generation framework that harmonizes these goals through three components: (1) reward modeling to quantify objectives, (2) style alignment for format compliance, and (3) consistency-aware optimization to coordinate joint improvements. CMAQ employs adaptive 𝛽-scaled DPO with geometric mean rewards, balancing CTR and expansion while mitigating objective conflicts. Extensive offline and online evaluations in a large-scale industrial setting demonstrate CMAQ’s superiority, achieving significant CTR gains (+2.3%) and higher human-rated query quality compared to state-of-the-art methods. Our approach enables high-quality query generation while sustaining user engagement and platform ecosystem health.
2024
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
Mozhi Zhang | Mianqiu Huang | Rundong Shi | Linsen Guo | Chong Peng | Peng Yan | Yaqian Zhou | Xipeng Qiu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Mozhi Zhang | Mianqiu Huang | Rundong Shi | Linsen Guo | Chong Peng | Peng Yan | Yaqian Zhou | Xipeng Qiu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the Uncertainty about the question and the Fidelity to the answer generated by language models. Then, we propose a plug-and-play method, UF Calibration, to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on Truly Well-Calibrated Confidence for large language models. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.
Search
Fix author
Co-authors
- Chong Peng 2
- Mengjiao Bao 1
- Shuxian Bi 1
- Tang Biao 1
- Fuli Feng 1
- Chongming Gao 1
- Linsen Guo 1
- Mianqiu Huang 1
- Shenshen Li 1
- Yifei Lu 1
- Wenxin Meng 1
- Yueqi Mou 1
- Xipeng Qiu (邱锡鹏) 1
- Fumin Shen 1
- Heng Tao Shen 1
- Rundong Shi 1
- Jingkuan Song 1
- Xiaohong Su 1
- Xi Wang 1
- Jianing Wang 1
- Lei Wang 1
- Wenjie Wang 1
- Chenxu Wang 1
- Xing Xu 1
- Hao Yang 1
- Zhuangzhuang Ye 1
- Xiaocheng Zhang 1
- Mozhi Zhang 1
- Yaqian Zhou 1