Linhao Luo
2026
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Shigeng Chen | Linhao Luo | Zhangchi Qiu | Yanan Cao | Carl Yang | Shirui Pan
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Shigeng Chen | Linhao Luo | Zhangchi Qiu | Yanan Cao | Carl Yang | Shirui Pan
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge editing (KE) has recently emerged as a promising technique to update specific facts in large language models (LLMs) without full retraining. While existing KE methods show promising results on general-domain benchmarks, their effectiveness in the medical domain remains largely unexplored. Medical knowledge editing poses unique challenges, requiring models not only to memorize new facts but also to internalize and generalize them for reliable and interpretable clinical decision-making. In this work, we propose MedEditBench, a rigorous evaluation framework for assessing medical knowledge editing. Our preliminary results reveal that current KE paradigm, which directly edits simple answers to the LLMs, often leads to superficial updates with poor generalization. To address this, we introduce Self-Generated Rationale Editing (SGR-Edit), which leverages model-generated rationales as editing targets, enabling deeper knowledge integration. Extensive experiments across diverse LLMs and KE methods demonstrate that SGR-Edit consistently improves editing efficacy and generalization. Furthermore, we examine the impact of sequential edits on in-domain medical knowledge, external-domain knowledge, as well as general model capabilities, offering practical insights for deploying KE in real-world medical applications.
2025
Continual Learning of Large Language Models
Tongtong Wu | Trang Vu | Linhao Luo | Gholamreza Haffari
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Tongtong Wu | Trang Vu | Linhao Luo | Gholamreza Haffari
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
As large language models (LLMs) continue to expand in size and utility, keeping them current with evolving knowledge and shifting user preferences becomes an increasingly urgent yet challenging task. This tutorial offers a comprehensive exploration of continual learning (CL) in the context of LLMs, presenting a structured framework that spans continual pre-training, instruction tuning, and alignment. Grounded in recent survey work and empirical studies, we discuss emerging trends, key methods, and practical insights from both academic research and industry deployments. In addition, we highlight the new frontier of lifelong LLM agents, i.e., systems capable of autonomous, self-reflective, and tool-augmented adaptation. Participants will gain a deep understanding of the computational, algorithmic, and ethical challenges inherent to CL in LLMs, and learn about strategies to mitigate forgetting, manage data and evaluation pipelines, and design systems that can adapt responsibly and reliably over time. This tutorial will benefit researchers and practitioners interested in advancing the long-term effectiveness, adaptability, and safety of foundation models.
2024
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Haolan Zhan | Zhuang Li | Xiaoxi Kang | Tao Feng | Yuncheng Hua | Lizhen Qu | Yi Ying | Mei Rianto Chandra | Kelly Rosalin | Jureynolds Jureynolds | Suraj Sharma | Shilin Qu | Linhao Luo | Ingrid Zukerman | Lay-Ki Soon | Zhaleh Semnani Azad | Reza Haf
Findings of the Association for Computational Linguistics: NAACL 2024
Haolan Zhan | Zhuang Li | Xiaoxi Kang | Tao Feng | Yuncheng Hua | Lizhen Qu | Yi Ying | Mei Rianto Chandra | Kelly Rosalin | Jureynolds Jureynolds | Suraj Sharma | Shilin Qu | Linhao Luo | Ingrid Zukerman | Lay-Ki Soon | Zhaleh Semnani Azad | Reza Haf
Findings of the Association for Computational Linguistics: NAACL 2024
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi — a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs
Minh-Vuong Nguyen | Linhao Luo | Fatemeh Shiri | Dinh Phung | Yuan-Fang Li | Thuy-Trang Vu | Gholamreza Haffari
Findings of the Association for Computational Linguistics: ACL 2024
Minh-Vuong Nguyen | Linhao Luo | Fatemeh Shiri | Dinh Phung | Yuan-Fang Li | Thuy-Trang Vu | Gholamreza Haffari
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs) have demonstrated strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizing knowledge graphs (KGs). We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs’ knowledge of reasoning and the accuracy of the generated CoT. Through experiments conducted on 5 different families of LLMs across 2 multi-hop question-answering datasets, we find that LLMs possess sufficient knowledge to perform reasoning. However, there exists a significant disparity between answer accuracy and faithfulness of the CoT generated by LLMs, indicating that they often arrive at correct answers through incorrect reasoning.
2023
Systematic Assessment of Factual Knowledge in Large Language Models
Linhao Luo | Trang Vu | Dinh Phung | Reza Haf
Findings of the Association for Computational Linguistics: EMNLP 2023
Linhao Luo | Trang Vu | Dinh Phung | Reza Haf
Findings of the Association for Computational Linguistics: EMNLP 2023
Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.
Search
Fix author
Co-authors
- Reza Haf 2
- Gholamreza Haffari 2
- Dinh Phung 2
- Trang Vu 2
- Yanan Cao 1
- Mei Rianto Chandra 1
- Shigeng Chen 1
- Tao Feng 1
- Yuncheng Hua 1
- Jureynolds Jureynolds 1
- Xiaoxi Kang 1
- Zhuang Li 1
- Yuan-Fang Li 1
- Minh-Vuong Nguyen 1
- Shirui Pan 1
- Zhangchi Qiu 1
- Lizhen Qu 1
- Shilin Qu 1
- Kelly Rosalin 1
- Zhaleh Semnani Azad 1
- Suraj Sharma 1
- Fatemeh Shiri 1
- Lay-Ki Soon 1
- Thuy Vu 1
- Tongtong Wu 1
- Carl Yang 1
- Yi Ying 1
- Haolan Zhan 1
- Ingrid Zukerman 1