Wenpeng Lu


2024

pdf
Medical Entity Disambiguation with Medical Mention Relation and Fine-grained Entity Knowledge
Wenpeng Lu | Guobiao Zhang | Xueping Peng | Hongjiao Guan | Shoujin Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Medical entity disambiguation (MED) plays a crucial role in natural language processing and biomedical domains, which is the task of mapping ambiguous medical mentions to structured candidate medical entities from knowledge bases (KBs). However, existing methods for MED often fail to fully utilize the knowledge within medical KBs and overlook essential interactions between medical mentions and candidate entities, resulting in knowledge- and interaction-inefficient modeling and suboptimal disambiguation performance. To address these limitations, this paper proposes a novel approach, MED with Medical Mention Relation and Fine-grained Entity Knowledge (MMR-FEK). Specifically, MMR-FEK incorporates a mention relation fusion module and an entity knowledge fusion module, followed by an interaction module. The former employs a relation graph convolutional network to fuse mention relation information between medical mentions to enhance mention representations, while the latter leverages an attention mechanism to fuse synonym and type information of candidate entities to enhance entity representations. Afterwards, an interaction module is designed to employ a bidirectional attention mechanism to capture interactions between mentions and entities to generate the matching representation. Extensive experiments on two publicly available real-world datasets demonstrate MMR-FEK’s superiority over state-of-the-art(SOTA) MED baselines across all metrics. Our source code is publicly available.

pdf
CHECKWHY: Causal Fact Verification via Argument Structure
Jiasheng Si | Yibo Zhao | Yingjie Zhu | Haiyang Zhu | Wenpeng Lu | Deyu Zhou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the growing complexity of fact verification tasks, the concern with “thoughtful” reasoning capabilities is increasing. However, recent fact verification benchmarks mainly focus on checking a narrow scope of semantic factoids within claims and lack an explicit logical reasoning process. In this paper, we introduce CHECKWHY, a challenging dataset tailored to a novel causal fact verification task: checking the truthfulness of the causal relation within claims through rigorous reasoning steps. CHECKWHY consists of over 19K “why” claim-evidence- argument structure triplets with supports, refutes, and not enough info labels. Each argument structure is composed of connected evidence, representing the reasoning process that begins with foundational evidence and progresses toward claim establishment. Through extensive experiments on state-of-the-art models, we validate the importance of incorporating the argument structure for causal fact verification. Moreover, the automated and human evaluation of argument structure generation reveals the difficulty in producing satisfying argument structure by fine-tuned models or Chain-of-Thought prompted LLMs, leaving considerable room for future improvements.

2022

pdf
Word Sense Disambiguation with Knowledge-Enhanced and Local Self-Attention-based Extractive Sense Comprehension
Guobiao Zhang | Wenpeng Lu | Xueping Peng | Shoujin Wang | Baoshuo Kan | Rui Yu
Proceedings of the 29th International Conference on Computational Linguistics

Word sense disambiguation (WSD), identifying the most suitable meaning of ambiguous words in the given contexts according to a predefined sense inventory, is one of the most classical and challenging tasks in natural language processing. Benefiting from the powerful ability of deep neural networks, WSD has achieved a great advancement in recent years. Reformulating WSD as a text span extraction task is an effective approach, which accepts a sentence context of an ambiguous word together with all definitions of its candidate senses simultaneously, and requires to extract the text span corresponding with the right sense. However, the approach merely depends on a short definition to learn sense representation, which neglects abundant semantic knowledge from related senses and leads to data-inefficient learning and suboptimal WSD performance. To address the limitations, we propose a novel WSD method with Knowledge-Enhanced and Local Self-Attention-based Extractive Sense Comprehension (KELESC). Specifically, a knowledge-enhanced method is proposed to enrich semantic representation by incorporating additional examples and definitions of the related senses in WordNet. Then, in order to avoid the huge computing complexity induced by the additional information, a local self-attention mechanism is utilized to constrain attention to be local, which allows longer input texts without large-scale computing burdens. Extensive experimental results demonstrate that KELESC achieves better performance than baseline models on public benchmark datasets.

2020

pdf
Intra-Correlation Encoding for Chinese Sentence Intention Matching
Xu Zhang | Yifeng Li | Wenpeng Lu | Ping Jian | Guoqiang Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Sentence intention matching is vital for natural language understanding. Especially for Chinese sentence intention matching task, due to the ambiguity of Chinese words, semantic missing or semantic confusion are more likely to occur in the encoding process. Although the existing methods have enriched text representation through pre-trained word embedding to solve this problem, due to the particularity of Chinese text, different granularities of pre-trained word embedding will affect the semantic description of a piece of text. In this paper, we propose an effective approach that combines character-granularity and word-granularity features to perform sentence intention matching, and we utilize soft alignment attention to enhance the local information of sentences on the corresponding levels. The proposed method can capture sentence feature information from multiple perspectives and correlation information between different levels of sentences. By evaluating on BQ and LCQMC datasets, our model has achieved remarkable results, and demonstrates better or comparable performance with BERT-based models.

2017

pdf
QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings
Fanqing Meng | Wenpeng Lu | Yuteng Zhang | Jinyong Cheng | Yuehan Du | Shuwang Han
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.

pdf
QLUT at SemEval-2017 Task 2: Word Similarity Based on Word Embedding and Knowledge Base
Fanqing Meng | Wenpeng Lu | Yuteng Zhang | Ping Jian | Shumin Shi | Heyan Huang
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper shows the details of our system submissions in the task 2 of SemEval 2017. We take part in the subtask 1 of this task, which is an English monolingual subtask. This task is designed to evaluate the semantic word similarity of two linguistic items. The results of runs are assessed by standard Pearson and Spearman correlation, contrast with official gold standard set. The best performance of our runs is 0.781 (Final). The techniques of our runs mainly make use of the word embeddings and the knowledge-based method. The results demonstrate that the combined method is effective for the computation of word similarity, while the word embeddings and the knowledge-based technique, respectively, needs more deeply improvement in details.

2016

pdf
BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content
Hao Wu | Heyan Huang | Wenpeng Lu
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)