Cathy Wu


Noise Reduction Methods for Distantly Supervised Biomedical Relation Extraction
Gang Li | Cathy Wu | K. Vijay-Shanker
BioNLP 2017

Distant supervision has been applied to automatically generate labeled data for biomedical relation extraction. Noise exists in both positively and negatively-labeled data and affects the performance of supervised machine learning methods. In this paper, we propose three novel heuristics based on the notion of proximity, trigger word and confidence of patterns to leverage lexical and syntactic information to reduce the level of noise in the distantly labeled data. Experiments on three different tasks, extraction of protein-protein-interaction, miRNA-gene regulation relation and protein-localization event, show that the proposed methods can improve the F-score over the baseline by 6, 10 and 14 points for the three tasks, respectively. We also show that when the models are configured to output high-confidence results, high precisions can be obtained using the proposed methods, making them promising for facilitating manual curation for databases.

Identifying Comparative Structures in Biomedical Text
Samir Gupta | A.S.M. Ashique Mahmood | Karen Ross | Cathy Wu | K. Vijay-Shanker
BioNLP 2017

Comparison sentences are very commonly used by authors in biomedical literature to report results of experiments. In such comparisons, authors typically make observations under two different scenarios. In this paper, we present a system to automatically identify such comparative sentences and their components i.e. the compared entities, the scale of the comparison and the aspect on which the entities are being compared. Our methodology is based on dependencies obtained by applying a parser to extract a wide range of comparison structures. We evaluated our system for its effectiveness in identifying comparisons and their components. The system achieved a F-score of 0.87 for comparison sentence identification and 0.77-0.81 for identifying its components.


An extended dependency graph for relation extraction in biomedical texts
Yifan Peng | Samir Gupta | Cathy Wu | Vijay Shanker
Proceedings of BioNLP 15


Dynamically Generating a Protein Entity Dictionary Using Online Resources
Hongfang Liu | Zhangzhi Hu | Cathy Wu
Proceedings of the ACL Interactive Poster and Demonstration Sessions


A Study of Text Categorization for Model Organism Databases
Hongfang Liu | Cathy Wu
HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases