Yi Du

2026

Recent advances in large language models (LLMs) and text-aware graph learning have increased interest in reasoning over text-attributed graphs(TAGs). In many real-world settings, such graphs are inherently heterogeneous, with most existing benchmarks remaining largely homogeneous in structure. As a result, the lack of large-scale benchmarks for heterogeneous text-attributed graphs has hindered systematic evaluation and fair comparison of existing methods. In this work, we introduce CITE - **C**atalytic **I**nformation **T**extual **E**ntities Graph, the first and largest heterogeneous text-attributed citation graph benchmark for catalytic materials. CITE contains over 438K nodes and 1.2M edges spanning four node types and four relation types, with rich node-level textual information. We establish standardized evaluation protocols for node classification and link prediction, and conduct ablation studies to assess the impact of graph heterogeneity and textual attributes. Using CITE, we benchmark four classes of learning paradigms, including homogeneous graph models, heterogeneous graph models, LLM-centric models, and hybrid LLM–graph models. By providing a large-scale heterogeneous text-attributed benchmark together with standardized evaluation protocols and comprehensive baselines, CITE enables systematic assessment across diverse modeling paradigms and offers new insights into text-aware and LLM-enhanced graph learning. The dataset, codebase and evaluation suite are publicly available.

pdf bib abs

AlphaEdit+: Model Editing in the Presence of Conflicting and Inconsistent Knowledge
Qing Liu | Jianhao Zhang | Ou Wu | Michael Ng | Yi Du
Findings of the Association for Computational Linguistics: ACL 2026

Knowledge editing is a crucial technique for daily updates in LLMs, requiring a balance between accurately modifying incorrect knowledge and preserving existing information. The recently proposed AlphaEdit method achieves competitive editing performance by updating parameters under null-space constraints. However, our theoretical analysis reveals that AlphaEdit struggles with high knowledge conflicts and inconsistencies during editing. To address this, we propose a new editing method AlphaEdit+, featuring three key improvements: 1) relaxing null-space constraints by adding a matrix perturbation through optimization to resolve conflicts between new and preserved knowledge; 2) introducing a weighting scheme on previously updated knowledge constraints to mitigate conflicts between new and historical editing; 3) developing a value smoothing algorithm to resolve high knowledge inconsistencies. These enhancements collectively ensure robust editing while maintaining model coherence. Comprehensive experiments show that our approach AlphaEdit+ not only resolves the brittleness of the original method on carefully constructed challenging datasets but also outperforms AlphaEdit on existing benchmark datasets.

2025

pdf bib abs

Temporal Knowledge Graphs (TKGs) incorporate the temporal feature to express the transience of knowledge by describing when facts occur. TKG extrapolation aims to infer possible future facts based on known history, which has garnered significant attention in recent years. Some existing methods treat TKG as a sequence of independent subgraphs to model temporal evolution patterns, demonstrating impressive reasoning performance. However, they still have limitations: 1) In modeling subgraph semantic evolution, they usually neglect the internal structural interactions between subgraphs, which are actually crucial for encoding TKGs. 2) They overlook the potential smooth features that do not lead to semantic changes, which should be distinguished from the semantic evolution process. Therefore, we propose Disentangled Multi-span Evolutionary Network (DiMNet) for TKG reasoning. Specifically, we design a multi-span evolution strategy that captures local neighbor features while perceiving historical neighbor semantic information, thus enabling internal interactions between subgraphs during the evolution process. To maximize the capture of semantic change patterns, we design a disentangle component that adaptively separates nodes’ active and stable features, used to dynamically control the influence of historical semantics on future evolution. Extensive experiments demonstrate that DiMNet achieves substantial performance in TKG reasoning, outperforming the state-of-the-art up to 22.7% in MRR.

2024

pdf bib abs

DP-CRE: Continual Relation Extraction via Decoupled Contrastive Learning and Memory Structure Preservation
Mengyi Huang | Meng Xiao | Ludi Wang | Yi Du
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continuous Relation Extraction (CRE) aims to incrementally learn relation knowledge from a non-stationary stream of data. Since the introduction of new relational tasks can overshadow previously learned information, catastrophic forgetting becomes a significant challenge in this domain. Current replay-based training paradigms prioritize all data uniformly and train memory samples through multiple rounds, which would result in overfitting old tasks and pronounced bias towards new tasks because of the imbalances of the replay set. To handle the problem, we introduce the DecouPled CRE (DP-CRE) framework that decouples the process of prior information preservation and new knowledge acquisition. This framework examines alterations in the embedding space as new relation classes emerge, distinctly managing the preservation and acquisition of knowledge. Extensive experiments show that DP-CRE significantly outperforms other CRE baselines across two datasets.

2023

pdf bib abs

Autodive: An Integrated Onsite Scientific Literature Annotation Tool
Yi Du | Ludi Wang | Mengyi Huang | Dongze Song | Wenjuan Cui | Yuanchun Zhou
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Scientific literature is always available in Adobe’s Portable Document Format (PDF), which is friendly for scientists to read. Compared with raw text, annotating directly on PDF documents can greatly improve the labeling efficiency of scientists whose annotation costs are very high. In this paper, we present Autodive, an integrated onsite scientific literature annotation tool for natural scientists and Natural Language Processing (NLP) researchers. This tool provides six core functions of annotation that support the whole lifecycle of corpus generation including i)annotation project management, ii)resource management, iii)ontology management, iv)manual annotation, v)onsite auto annotation, and vi)annotation task statistic. Two experiments are carried out to verify efficiency of the presented tool. A live demo of Autodive is available at http://autodive.sciwiki.cn. The source code is available at https://github.com/Autodive.

Co-authors

Qi Hao 1

Ou Wu 1

Venues

Fix author