Mengjia Wu
2026
Evidence-Aligned Entity Verification for Hallucination Detection in Retrieval-Augmented Generation
Runsong Jia | Zhen Fang | Mengjia Wu | Jie Lu | Yi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Runsong Jia | Zhen Fang | Mengjia Wu | Jie Lu | Yi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Hallucination detection is crucial for large language models (LLMs), as hallucinated content creates significant barriers in applications requiring factual accuracy. Current detection methods mainly depend on internal signals like uncertainty and self-consistency checks, using the model’s pre-trained knowledge to identify unreliable outputs. However, pre-trained knowledge may become outdated and has coverage limitations, especially for specialized or recent information. To address these limitations, retrieval-augmented generation (RAG) has emerged as a promising solution by retrieving relevant evidence at inference time, grounding outputs beyond the model’s parametric knowledge. In this paper, we target a critical and practical learning problem RAG-based hallucination detection (RHD), where RAG is employed to enhance hallucination detection by addressing information updating challenges. To address RHD, we propose a novel method Evidence-Aligned Entity Verification (EAEV), which detects entity-level hallucinations by leveraging RAG to align generated entities with retrieved evidence contexts. Specifically, EAEV evaluates entity-evidence alignment through three complementary dimensions and introduces counterfactual stability analysis to ensure robust alignments under evidence perturbations. Experiments across multiple RAG benchmarks demonstrate that EAEV achieves consistent improvements over existing methods with strong generalization capabilities.
From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation
Mingfei Lu | Yi Zhang | Mengjia Wu | Yue Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mingfei Lu | Yi Zhang | Mengjia Wu | Yue Feng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Legal consultation question answering (Legal CQA) presents unique challenges compared to traditional legal QA tasks, including the scarcity of high-quality training data, complex task composition, and strong contextual dependencies. To address these, we construct JurisCQAD, a large-scale dataset of over 43,000 real-world Chinese legal queries annotated with expert-validated positive and negative responses, and design a structured task decomposition that converts each query into a legal element graph integrating entities, events, intents, and legal issues. We further propose JurisMA, a modular multi-agent framework supporting dynamic routing, statutory grounding, and stylistic optimization. Combined with the element graph, the framework enables strong context-aware reasoning, effectively capturing dependencies across legal facts, norms, and procedural logic. Trained on JurisCQAD and evaluated on a refined LawBench, our system significantly outperforms both general-purpose and legal-domain LLMs across multiple lexical and semantic metrics, demonstrating the benefits of interpretable decomposition and modular collaboration in Legal CQA.
2025
Equal Truth: Rumor Detection with Invariant Group Fairness
Junyi Chen | Mengjia Wu | Qian Liu | Jing Sun | Ying Ding | Yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Junyi Chen | Mengjia Wu | Qian Liu | Jing Sun | Ying Ding | Yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Due to the widespread dissemination of rumors on social media platforms, detecting rumors has been a long-standing concern for various communities. However, existing rumor detection methods rarely consider the fairness issues inherent in the model, which can lead to biased predictions across different stakeholder groups (e.g., domains and originating platforms of the detected content), also undermining their detection effectiveness. In this work, we propose a two-step framework to address this issue. First, we perform unsupervised partitioning to dynamically identify potential unfair data patterns without requiring sensitive attribute annotations. Then, we apply invariant learning to these partitions to extract fair and informative feature representations that enhance rumor detection. Extensive experiments show that our method outperforms strong baselines regarding detection and fairness performance, and also demonstrate robust performance on out-of-distribution samples. Further empirical results indicate that our learned features remain informative and fair across stakeholder groups and can correct errors when applied to existing baselines.
HetGCoT: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Academic Question Answering
Runsong Jia | Mengjia Wu | Ying Ding | Jie Lu | Yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Runsong Jia | Mengjia Wu | Ying Ding | Jie Lu | Yi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Academic question answering (QA) in heterogeneous scholarly networks presents unique challenges requiring both structural understanding and interpretable reasoning. While graph neural networks (GNNs) capture structured graph information and large language models (LLMs) demonstrate strong capabilities in semantic comprehension, current approaches lack integration at the reasoning level. We propose HetGCoT, a framework enabling LLMs to effectively leverage and learn information from graphs to reason interpretable academic QA results. Our framework introduces three technical contributions: (1) a framework that transforms heterogeneous graph structural information into LLM-processable reasoning chains, (2) an adaptive metapath selection mechanism identifying relevant subgraphs for specific queries, and (3) a multi-step reasoning strategy systematically incorporating graph contexts into the reasoning process. Experiments on OpenAlex and DBLP datasets show our approach outperforms all sota baselines. The framework demonstrates adaptability across different LLM architectures and applicability to various scholarly question answering tasks.