Khac-Hoai Nam Bui

2025

pdf bib abs
Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation
Hoang Pham | Thanh-Do Nguyen | Khac-Hoai Nam Bui
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Claim verification is a long-standing and challenging task that demands not only high accuracy but also explainability and thoroughness of the verification process. This task becomes an emerging research issue in the era of large language models (LLMs) since real-world claims are often complex, featuring intricate semantic structures or obfuscated entities. Traditional approaches typically address this by decomposing claims into sub-claims and querying a knowledge base to resolve hidden or ambiguous entities. However, the absence of effective disambiguation strategies for these entities can compromise the entire verification process. To address these challenges, we propose Verify-in-the-Graph (VeGraph), a novel framework leveraging the reasoning and comprehension abilities of LLM agents. VeGraph operates in three phases: (1) Graph Representation - an input claim is decomposed into structured triplets, forming a graph-based representation that integrates both structured and unstructured information; (2) Entity Disambiguation -VeGraph iteratively interacts with the knowledge base to resolve ambiguous entities within the graph for deeper sub-claim verification; and (3) Verification - remaining triplets are verified to complete the fact-checking process. Experiments using Meta-Llama-3-70B (instruct version) show that VeGraph achieves competitive performance compared to baselines across benchmarks (HoVer and FEVEROUS), effectively addressing claim verification challenges. Our source code and data are available for further exploitation.

2024

pdf bib abs
SynTOD: Augmented Response Synthesis for Robust End-to-End Task-Oriented Dialogue System
Nguyen Quang Chieu | Quang-Minh Tran | Khac-Hoai Nam Bui
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Task-oriented dialogue (TOD) systems are introduced to solve specific tasks, which focus on training multiple tasks such as language understanding, tracking states, and generating appropriate responses to help users achieve their specific goals. Currently, one of the remaining challenges in this emergent research field is the capability to produce more robust architectures fine-tuned for end-to-end TOD systems. In this study, we consider this issue by exploiting the ability of pre-trained models to provide synthesis responses, which are then used as the input for the fine-tuned process. The main idea is to overcome the gap between the training process and inference process during fine-tuning end-to-end TOD systems. The experiment on Multiwoz datasets shows the effectiveness of our model compared with strong baselines in this research field. The source code is available for further exploitation.

2022

pdf bib abs
Multi Graph Neural Network for Extractive Long Document Summarization
Xuan-Dung Doan | Le-Minh Nguyen | Khac-Hoai Nam Bui
Proceedings of the 29th International Conference on Computational Linguistics

Heterogeneous Graph Neural Networks (HeterGNN) have been recently introduced as an emergent approach for extracting document summarization (EDS) by exploiting the cross-relations between words and sentences. However, applying HeterGNN for long documents is still an open research issue. One of the main majors is the lacking of inter-sentence connections. In this regard, this paper exploits how to apply HeterGNN for long documents by building a graph on sentence-level nodes (homogeneous graph) and combine with HeterGNN for capturing the semantic information in terms of both inter and intra-sentence connections. Experiments on two benchmark datasets of long documents such as PubMed and ArXiv show that our method is able to achieve state-of-the-art results in this research field.

pdf bib abs
HeterGraphLongSum: Heterogeneous Graph Neural Network with Passage Aggregation for Extractive Long Document Summarization
Tuan-Anh Phan | Ngoc-Dung Ngoc Nguyen | Khac-Hoai Nam Bui
Proceedings of the 29th International Conference on Computational Linguistics

Graph Neural Network (GNN)-based models have proven effective in various Natural Language Processing (NLP) tasks in recent years. Specifically, in the case of the Extractive Document Summarization (EDS) task, modeling documents under graph structure is able to analyze the complex relations between semantic units (e.g., word-to-word, word-to-sentence, sentence-to-sentence) and enrich sentence representations via valuable information from their neighbors. However, long-form document summarization using graph-based methods is still an open research issue. The main challenge is to represent long documents in a graph structure in an effective way. In this regard, this paper proposes a new heterogeneous graph neural network (HeterGNN) model to improve the performance of long document summarization (HeterGraphLongSum). Specifically, the main idea is to add the passage nodes into the heterogeneous graph structure of word and sentence nodes for enriching the final representation of sentences. In this regard, HeterGraphLongSum is designed with three types of semantic units such as word, sentence, and passage. Experiments on two benchmark datasets for long documents such as Pubmed and Arxiv indicate promising results of the proposed model for the extractive long document summarization problem. Especially, HeterGraphLongSum is able to achieve state-of-the-art performance without relying on any pre-trained language models (e.g., BERT). The source code is available for further exploitation on the Github.

pdf bib
Extractive Text Summarization with Latent Topics using Heterogeneous Graph Neural Network
Tuan Anh Phan | Ngoc Dung Nguyen | Khac-Hoai Nam Bui
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

Co-authors

Venues

Fix data