Sun Kim


2017

pdf
BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations
Rezarta Islamaj Doğan | Andrew Chatr-aryamontri | Sun Kim | Chih-Hsuan Wei | Yifan Peng | Donald Comeau | Zhiyong Lu
BioNLP 2017

The Precision Medicine Track in BioCre-ative VI aims to bring together the Bi-oNLP community for a novel challenge focused on mining the biomedical litera-ture in search of mutations and protein-protein interactions (PPI). In order to support this track with an effective train-ing dataset with limited curator time, the track organizers carefully reviewed Pub-Med articles from two different sources: curated public PPI databases, and the re-sults of state-of-the-art public text mining tools. We detail here the data collection, manual review and annotation process and describe this training corpus charac-teristics. We also describe a corpus per-formance baseline. This analysis will provide useful information to developers and researchers for comparing and devel-oping innovative text mining approaches for the BioCreative VI challenge and other Precision Medicine related applica-tions.

pdf
Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs
Sunil Mohan | Nicolas Fiorini | Sun Kim | Zhiyong Lu
BioNLP 2017

We describe a Deep Learning approach to modeling the relevance of a document’s text to a query, applied to biomedical literature. Instead of mapping each document and query to a common semantic space, we compute a variable-length difference vector between the query and document which is then passed through a deep convolution stage followed by a deep regression network to produce the estimated probability of the document’s relevance to the query. Despite the small amount of training data, this approach produces a more robust predictor than computing similarities between semantic vector representations of the query and document, and also results in significant improvements over traditional IR text factors. In the future, we plan to explore its application in improving PubMed search.

2016

pdf
PubTermVariants: biomedical term variants and their use for PubMed search
Lana Yeganova | Won Kim | Sun Kim | Rezarta Islamaj Doğan | Wanli Liu | Donald C Comeau | Zhiyong Lu | W John Wilbur
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf
Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis
Sun Kim | Lana Yeganova | W. John Wilbur
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2012

pdf
Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers
Sun Kim | Won Kim | Don Comeau | W. John Wilbur
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing