Jiarun Cao

2023

pdf bib abs
Gaussian Distributed Prototypical Network for Few-shot Genomic Variant Detection
Jiarun Cao | Niels Peek | Andrew Renehan | Sophia Ananiadou
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Automatically identifying genetic mutations in the cancer literature using text mining technology has been an important way to study the vast amount of cancer medical literature. However, novel knowledge regarding the genetic variants proliferates rapidly, though current supervised learning models struggle with discovering these unknown entity types. Few-shot learning allows a model to perform effectively with great generalization on new entity types, which has not been explored in recognizing cancer mutation detection. This paper addresses cancer mutation detection tasks with few-shot learning paradigms. We propose GDPN framework, which models the label dependency from the training examples in the support set and approximates the transition scores via Gaussian distribution. The experiments on three benchmark cancer mutation datasets show the effectiveness of our proposed model.

2021

This paper presents our wining contribution to SemEval 2021 Task 8: MeasEval. The purpose of this task is identifying the counts and measurements from clinical scientific discourse, including quantities, entities, properties, qualifiers, units, modifiers, and their mutual relations. This task can be induced to a joint entity and relation extraction problem. Accordingly, we propose CONNER, a cascade count and measurement extraction tool that can identify entities and the corresponding relations in a two-step pipeline model. We provide a detailed description of the proposed model hereinafter. Furthermore, the impact of the essential modules and our in-process technical schemes are also investigated.

pdf abs
GenerativeRE: Incorporating a Novel Copy Mechanism and Pretrained Model for Joint Entity and Relation Extraction
Jiarun Cao | Sophia Ananiadou
Findings of the Association for Computational Linguistics: EMNLP 2021

Previous neural Seq2Seq models have shown the effectiveness for jointly extracting relation triplets. However, most of these models suffer from incompletion and disorder problems when they extract multi-token entities from input sentences. To tackle these problems, we propose a generative, multi-task learning framework, named GenerativeRE. We firstly propose a special entity labelling method on both input and output sequences. During the training stage, GenerativeRE fine-tunes the pre-trained generative model and learns the special entity labels simultaneously. During the inference stage, we propose a novel copy mechanism equipped with three mask strategies, to generate the most probable tokens by diminishing the scope of the model decoder. Experimental results show that our model achieves 4.6% and 0.9% F1 score improvements over the current state-of-the-art methods in the NYT24 and NYT29 benchmark datasets respectively.

2020

Electronic Medical Records (EMRs) have become key components of modern medical care systems. Despite the merits of EMRs, many doctors suffer from writing them, which is time-consuming and tedious. We believe that automatically converting medical dialogues to EMRs can greatly reduce the burdens of doctors, and extracting information from medical dialogues is an essential step. To this end, we annotate online medical consultation dialogues in a window-sliding style, which is much easier than the sequential labeling annotation. We then propose a Medical Information Extractor (MIE) towards medical dialogues. MIE is able to extract mentioned symptoms, surgeries, tests, other information and their corresponding status. To tackle the particular challenges of the task, MIE uses a deep matching architecture, taking dialogue turn-interaction into account. The experimental results demonstrate MIE is a promising solution to extract medical information from doctor-patient dialogues.

Co-authors

Xi Chen 1

Jiarun Cao

2023

2021

2020

Co-authors

Venues