Xia Zhang


2026

Fine-grained entity typing (FET) aims to assign semantically rich and contextually appropriate types to entity mentions. While recent studies have explored the use of large language models (LLMs) for this task, two key challenges persist. First, FET typically involves a large number of entity types, making it difficult for LLMs to perform accurate classification. Second, the presence of label noise in the training data introduced by automatic supervision methods hinders effective fine-tuning. To address these challenges, we propose DR-FET, a descriptor-based retrieval-augmented framework that reduces the effective label space and constructs high-precision training data under noisy supervision. Our method introduces natural language descriptors as an intermediate semantic representation for both entity mentions and types. During inference, entity descriptors are used to retrieve a small set of semantically relevant candidate types, enabling the LLM to perform fine-grained classification under explicit candidate constraints. During training, the same descriptor and retrieval mechanism is reused to identify high-confidence instances from distantly supervised data, prioritizing label precision for efficient fine-tuning. Experiments on two widely used benchmarks demonstrate that the proposed method consistently outperforms existing fine-grained entity typing approaches under noisy supervision.

2022

Entity Alignment (EA) aims to find equivalent entities between two Knowledge Graphs (KGs). While numerous neural EA models have been devised, they are mainly learned using labelled data only. In this work, we argue that different entities within one KG should have compatible counterparts in the other KG due to the potential dependencies among the entities. Making compatible predictions thus should be one of the goals of training an EA model along with fitting the labelled data: this aspect however is neglected in current methods. To power neural EA models with compatibility, we devise a training framework by addressing three problems: (1) how to measure the compatibility of an EA model; (2) how to inject the property of being compatible into an EA model; (3) how to optimise parameters of the compatibility model. Extensive experiments on widely-used datasets demonstrate the advantages of integrating compatibility within EA models. In fact, state-of-the-art neural EA models trained within our framework using just 5% of the labelled data can achieve comparable effectiveness with supervised training using 20% of the labelled data.
With the development of medical digitization, the extraction and structuring of Electronic Medical Records (EMRs) have become challenging but fundamental tasks. How to accurately and automatically extract structured information from medical dialogues is especially difficult because the information needs to be inferred from complex interactions between the doctor and the patient. To this end, in this paper, we propose a speaker-aware co-attention framework for medical dialogue information extraction. To better utilize the pre-trained language representation model to perceive the semantics of the utterance and the candidate item, we develop a speaker-aware dialogue encoder with multi-task learning, which considers the speaker’s identity into account. To deal with complex interactions between different utterances and the correlations between utterances and candidate items, we propose a co-attention fusion network to aggregate the utterance information. We evaluate our framework on the public medical dialogue extraction datasets to demonstrate the superiority of our method, which can outperform the state-of-the-art methods by a large margin. Codes will be publicly available upon acceptance.

2013