Zhao Lv
2026
ReFL: Reflective Feedback Learning for Hallucination Detection of Large Language Models
Cunhang Fan | Jun Zhang | Xue Zhang | Shuai Zhang | Zhao Lv | Jianhua Tao | Zhengqi Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cunhang Fan | Jun Zhang | Xue Zhang | Shuai Zhang | Zhao Lv | Jianhua Tao | Zhengqi Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) often generate factually incorrect content, known as “hallucinations”, which undermine the reliability and safety of their outputs. Existing hallucination detection methods either depend on external knowledge sources, incurring high computational costs and limiting real-time applicability, or extract the model’s internal states, leading to poor generalization. To address these issues, this paper proposes ReFL, a hallucination detection framework. ReFL leverages corrective in-context learning to dynamically guide LLMs to recognize their own prediction errors and adjust internal representations, critically without updating model weights. Specifically, by introducing a corrective in-context learning strategy, where triplets of input text, model prediction, and ground-truth label are embedded into the prompt to make the model explicitly aware of its own errors. The model reflects on prior outputs to adjust its internal states and generate semantically structured representations better aligned with factuality. This feedback mechanism encourages the model to shape a more coherent semantic space and enhances the LLM’s internal sensitivity to hallucinations. Experimental results on two benchmark datasets demonstrate that ReFL consistently outperforms existing methods, achieving state-of-the-art performance.
2025
COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation
Di Zhao | Longhui Ma | Siwei Wang | Miao Wang | Zhao Lv
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Di Zhao | Longhui Ma | Siwei Wang | Miao Wang | Zhao Lv
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures struggle to adapt to diverse GUI application scenarios, leading to inadequate scenario generalization; (2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for GUI agent decision error. To address these limitations, we introduce COLA, a collaborative multi-agent framework for automating GUI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Experiments on the GAIA dataset show that COLA achieves competitive performance among GUI Agent methods, with an average accuracy of 31.89%. On WindowsAgentArena, it performs particularly well in Web Browser (33.3%), Media & Video (33.3%), and Windows Utils (25.0%), suggesting the effectiveness of specialized agent design and dynamic strategy allocation. The code is available at https://github.com/Alokia/COLA-demo.
2024
UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models
Zhanyue Qin | Haochuan Wang | Deyuan Liu | Ziyang Song | Cunhang Fan | Zhao Lv | Jinlin Wu | Zhen Lei | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Zhanyue Qin | Haochuan Wang | Deyuan Liu | Ziyang Song | Cunhang Fan | Zhao Lv | Jinlin Wu | Zhen Lei | Zhiying Tu | Dianhui Chu | Xiaoyan Yu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Sequential decision-making refers to algorithms that take into account the dynamics of the environment, where early decisions affect subsequent decisions. With large language models (LLMs) demonstrating powerful capabilities between tasks, we can’t help but ask: Can Current LLMs Effectively Make Sequential Decisions? In order to answer this question, we propose the UNO Arena based on the card game UNO to evaluate the sequential decision-making capability of LLMs and explain in detail why we choose UNO. In UNO Arena, We evaluate the sequential decision-making capability of LLMs dynamically with novel metrics based Monte Carlo methods. We set up random players, DQN-based reinforcement learning players, and LLM players (e.g. GPT-4, Gemini-pro) for comparison testing. Furthermore, in order to improve the sequential decision-making capability of LLMs, we propose the TUTRI player, which can involves having LLMs reflect their own actions with the summary of game history and the game strategy. Numerous experiments demonstrate that the TUTRI player achieves a notable breakthrough in the performance of sequential decision-making compared to the vanilla LLM player.
Bilateral Masking with prompt for Knowledge Graph Completion
Yonghui Kong | Cunhang Fan | Yujie Chen | Shuai Zhang | Zhao Lv | Jianhua Tao
Findings of the Association for Computational Linguistics: NAACL 2024
Yonghui Kong | Cunhang Fan | Yujie Chen | Shuai Zhang | Zhao Lv | Jianhua Tao
Findings of the Association for Computational Linguistics: NAACL 2024
The pre-trained language model (PLM) has achieved significant success in the field of knowledge graph completion (KGC) by effectively modeling entity and relation descriptions. In recent studies, the research in this field has been categorized into methods based on word matching and sentence matching, with the former significantly lags behind. However, there is a critical issue in word matching methods, which is that these methods fail to obtain satisfactory single embedding representations for entities.To address this issue and enhance entity representation, we propose the Bilateral Masking with prompt for Knowledge Graph Completion (BMKGC) approach.Our methodology employs prompts to narrow the distance between the predicted entity and the known entity. Additionally, the BMKGC model incorporates a bi-encoder architecture, enabling simultaneous predictions at both the head and tail. Furthermore, we propose a straightforward technique to augment positive samples, mitigating the problem of degree bias present in knowledge graphs and thereby improving the model’s robustness. Experimental results conclusively demonstrate that BMKGC achieves state-of-the-art performance on the WN18RR dataset.
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu | Zhanyue Qin | Hairu Wang | Zhao Yang | Zecheng Wang | Fangying Rong | Qingbin Liu | Yanchao Hao | Bo Li | Xi Chen | Cunhang Fan | Zhao Lv | Dianhui Chu | Zhiying Tu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Deyuan Liu | Zhanyue Qin | Hairu Wang | Zhao Yang | Zecheng Wang | Fangying Rong | Qingbin Liu | Yanchao Hao | Bo Li | Xi Chen | Cunhang Fan | Zhao Lv | Dianhui Chu | Zhiying Tu | Dianbo Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Information Bottleneck (IB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs. We make our code available at https://github.com/SempraETY/Pruning-via-Merging
Search
Fix author
Co-authors
- Cunhang Fan 4
- Dianhui Chu 2
- Deyuan Liu 2
- Zhanyue Qin 2
- Dianbo Sui 2
- Jianhua Tao 2
- Zhiying Tu 2
- Yujie Chen 1
- Xi Chen 1
- Yanchao Hao 1
- Yonghui Kong 1
- Zhen Lei 1
- Bo Li 1
- Qingbin Liu 1
- Longhui Ma 1
- Fangying Rong 1
- Ziyang Song 1
- Haochuan Wang 1
- Siwei Wang 1
- Miao Wang 1
- Hairu Wang 1
- Zecheng Wang 1
- Zhengqi Wen 1
- Jinlin Wu 1
- Zhao Yang 1
- Xiaoyan Yu 1
- Shuai Zhang 1
- Jun Zhang 1
- Xue Zhang 1
- Shuai Zhang 1
- Di Zhao 1