Haipeng Chen
2026
Confidence-Aware Ranker Ensembles for Robust In-Context Knowledge Editing
Tejal Nair | Mahmud Wasif Nafee | Maiqi Jiang | Ashley Gao | Haipeng Chen | Yanfu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Tejal Nair | Mahmud Wasif Nafee | Maiqi Jiang | Ashley Gao | Haipeng Chen | Yanfu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Although large language models (LLMs) excel at factual recall, they can still propagate stale or incorrect knowledge, making in-context knowledge editing a gradient-free remedy suitable for black-box APIs. These knowledge editors that use in-context learning typically rely on a single retriever and surface-similarity heuristics to build prompts. However, a key observation in this study is that retrievers can be complementary: semantic rankers may recover paraphrased evidence, while lexical or feature-based retrievers may preserve precise entities and cues. This creates two gaps in single-retriever editors: they (i) miss complementary evidence that different retrievers surface and (ii) cannot adapt when one retriever is clearly more reliable for a query. We introduce a Feature-Weighted Ensemble for In-context Knowledge Editing (FWE-IKE) that calibrates three heterogeneous rankers (LLM-, BERT-, and MLP-based), extracts simple confidence features from each ranker, predicts per-query mixture weights, and applies a conservative margin-based routing gate that selects a single expert when confident; otherwise we mix calibrated distributions with learned per-query weights. On the CounterFact benchmark, FWE-IKE attains 88.33% Edit-Success Rate, a +3.0 point gain over the best single retriever and approaching the oracle upper bound (91%). Case studies, an ablation study, and analyses show the method systematically recovers complementary wins (e.g., BERT-only, LLM-only, MLP-only slices). FWE-IKE improves edit accuracy without touching model weights and provides a practical path to more robust, confidence-aware retrieval for IKE.
2025
Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization
Mahmud Wasif Nafee | Maiqi Jiang | Haipeng Chen | Yanfu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Mahmud Wasif Nafee | Maiqi Jiang | Haipeng Chen | Yanfu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) excel at factual recall yet still propagate stale or incorrect knowledge. In‐context knowledge editing offers a gradient-free remedy suitable for black-box APIs, but current editors rely on static demonstration sets chosen by surface-level similarity, leading to two persistent obstacles: (i) a quantity–quality trade-off, and (ii) lack of adaptivity to task difficulty. We address these issues by dynamically selecting supporting demonstrations according to their utility for the edit. We propose **D**ynamic **R**etriever for **I**n-Context **K**nowledge **E**diting (DR-IKE), a lightweight framework that (1) trains a BERT retriever with REINFORCE to rank demonstrations by editing reward, and (2) employs a *learnable threshold σ* to prune low-value examples, shortening the prompt when the edit is easy and expanding it when the task is hard. DR-IKE performs editing without modifying model weights, relying solely on forward passes for compatibility with black-box LLMs. On the CounterFact benchmark, it improves edit success by up to 17.1%, reduces latency by 41.6%, and preserves accuracy on unrelated queries—demonstrating scalable and adaptive knowledge editing.
2024
RESTful-Llama: Connecting User Queries to RESTful APIs
Han Xu | Ruining Zhao | Jindong Wang | Haipeng Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Han Xu | Ruining Zhao | Jindong Wang | Haipeng Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Recent advancements in Large Language Models (LLMs) have showcased exceptional performance in zero-shot learning and reasoning tasks. However, integrating these models with external tools - a crucial need for real-world applications - remains a significant challenge. We propose RESTful-Llama, a novel framework designed to enable Llama 3.1 to transform natural language instructions into effective RESTful API calls. To enhance the fine-tuning process, we introduce DOC_Mine, a method to generate fine-tuning datasets from public API documentation. RESTful-Llama distinguishes itself by enabling open-source LLMs to efficiently interact with and adapt to any REST API system. Experiments demonstrate a 31.9% improvement in robustness and a 2.33x increase in efficiency compared to existing methods.
2019
An ensemble CNN method for biomedical entity normalization
Pan Deng | Haipeng Chen | Mengyao Huang | Xiaowen Ruan | Liang Xu
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks
Pan Deng | Haipeng Chen | Mengyao Huang | Xiaowen Ruan | Liang Xu
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks
Different representations of the same concept could often be seen in scientific reports and publications. Entity normalization (or entity linking) is the task to match the different representations to their standard concepts. In this paper, we present a two-step ensemble CNN method that normalizes microbiology-related entities in free text to concepts in standard dictionaries. The method is capable of linking entities when only a small microbiology-related biomedical corpus is available for training, and achieved reasonable performance in the online test of the BioNLP-OST19 shared task Bacteria Biotope.