Qianqian Zhuang
2025
InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding
Cheng Jiayang
|
Qianqian Zhuang
|
Haoran Li
|
Chunkit Chan
|
Xin Liu
|
Lin Qiu
|
Yangqiu Song
Findings of the Association for Computational Linguistics: EMNLP 2025
Grounding large language models (LLMs) in external knowledge sources is a promising method for faithful prediction. While existing grounding approaches work well for simple queries, many real-world information needs require synthesizing multiple pieces of evidence. We introduce “integrative grounding” – the challenge of retrieving and verifying multiple inter-dependent pieces of evidence to support a hypothesis query. To systematically study this problem, we repurpose data from four domains for evaluating integrative grounding capabilities. Our investigation reveals two critical findings: First, in groundedness verification, while LLMs are robust to redundant evidence, they tend to rationalize using internal knowledge when information is incomplete. Second, in examining retrieval planning strategies, we find that undirected planning can degrade performance through noise introduction, while premise abduction emerges as a promising approach due to its logical constraints. Additionally, LLMs’ zero-shot self-reflection capabilities consistently improve grounding quality. These insights provide valuable direction for developing more effective integrative grounding systems.
2024
ECON: On the Detection and Resolution of Evidence Conflicts
Cheng Jiayang
|
Chunkit Chan
|
Qianqian Zhuang
|
Lin Qiu
|
Tianhang Zhang
|
Tengxiao Liu
|
Yangqiu Song
|
Yue Zhang
|
Pengfei Liu
|
Zheng Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems, leading to the prevalence of AI-generated content and challenges in detecting misinformation and managing conflicting information, or “inter-evidence conflicts.” This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios. We evaluate conflict detection methods, including Natural Language Inference (NLI) models, factual consistency (FC) models, and LLMs, on these conflicts (RQ1) and analyze LLMs’ conflict resolution behaviors (RQ2). Our key findings include: (1) NLI and LLM models exhibit high precision in detecting answer conflicts, though weaker models suffer from low recall; (2) FC models struggle with lexically similar answer conflicts, while NLI and LLM models handle these better; and (3) stronger models like GPT-4 show robust performance, especially with nuanced conflicts. For conflict resolution, LLMs often favor one piece of conflicting evidence without justification and rely on internal knowledge if they have prior beliefs.
Search
Fix author
Co-authors
- Chunkit Chan 2
- Cheng Jiayang 2
- Lin Qiu 2
- Yangqiu Song 2
- Haoran Li 1
- show all...