Yaxi Li
2026
ContextCheck: Sentence-Level Faithfulness Verification with Context-Aware Disambiguation
Yueqin Yin | Yaxi Li | Xin Liu | Xun Wang | Kaiqiang Song | Simin Ma | Shujian Liu | Sathish Reddy Indurthi | Haoyun Deng | Pengcheng He | Mingyuan Zhou | Song Wang
Findings of the Association for Computational Linguistics: ACL 2026
Yueqin Yin | Yaxi Li | Xin Liu | Xun Wang | Kaiqiang Song | Simin Ma | Shujian Liu | Sathish Reddy Indurthi | Haoyun Deng | Pengcheng He | Mingyuan Zhou | Song Wang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models often hallucinate, producing content that is factually incorrect or not grounded in the sources. Reliable faithfulness verification is critical for trustworthy deployment. In the provided-source (closed-world) setting, existing verifiers either classify whole passages in one step or check sentences independently, overlooking cross-sentence context. We present ContextCheck, a framework for sentence-level faithfulness verification with context-aware disambiguation. Each sentence is verified against the grounding document while conditioning on preceding sentences, enabling pronouns and references to be resolved directly in context. This design avoids the separate decontextualization step of rewriting claims into self-contained forms, casting verification as a context-conditioned task. Fine-tuned from Llama-3.1-8B-Instruct, ContextCheck sets a new state of the art on three context-dependent datasets; it improves Macro F1 by over 10 points compared to the strongest baselines, and matches or slightly surpasses the strongest baselines on 14 standard single-sentence datasets compared to prior 8B-scale verifiers (average Macro F1 73.5 vs. 72.8). These results show that ContextCheck offers a practical and effective approach for sentence-level hallucination detection.
2025
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data
Deren Lei | Yaxi Li | Siyao Li | Mengya Hu | Rui Xu | Ken Archer | Mingyu Wang | Emily Ching | Alex Deng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Deren Lei | Yaxi Li | Siyao Li | Mengya Hu | Rui Xu | Ken Archer | Mingyu Wang | Emily Ching | Alex Deng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Prior research on training grounded factuality classification models to detect hallucinations in large language models (LLMs) has relied on public natural language inference (NLI) data and synthetic data. However, conventional NLI datasets are not well-suited for document-level reasoning, which is critical for detecting LLM hallucinations. Recent approaches to document-level synthetic data generation involve iteratively removing sentences from documents and annotating factuality using LLM-based prompts. While effective, this method is computationally expensive for long documents and limited by the LLM’s capabilities. In this work, we analyze the differences between existing synthetic training data used in state-of-the-art models and real LLM output claims. Based on our findings, we propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on context graphs extracted from documents. Our fact checker model, FactCG, demonstrates improved performance with more connected reasoning, using the same backbone models. Experiments show it even outperforms GPT-4-o on the LLM-Aggrefact benchmark with much smaller model size.