Abstract
Distant supervision has been applied to automatically generate labeled data for biomedical relation extraction. Noise exists in both positively and negatively-labeled data and affects the performance of supervised machine learning methods. In this paper, we propose three novel heuristics based on the notion of proximity, trigger word and confidence of patterns to leverage lexical and syntactic information to reduce the level of noise in the distantly labeled data. Experiments on three different tasks, extraction of protein-protein-interaction, miRNA-gene regulation relation and protein-localization event, show that the proposed methods can improve the F-score over the baseline by 6, 10 and 14 points for the three tasks, respectively. We also show that when the models are configured to output high-confidence results, high precisions can be obtained using the proposed methods, making them promising for facilitating manual curation for databases.- Anthology ID:
- W17-2323
- Volume:
- BioNLP 2017
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada,
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 184–193
- Language:
- URL:
- https://aclanthology.org/W17-2323
- DOI:
- 10.18653/v1/W17-2323
- Cite (ACL):
- Gang Li, Cathy Wu, and K. Vijay-Shanker. 2017. Noise Reduction Methods for Distantly Supervised Biomedical Relation Extraction. In BioNLP 2017, pages 184–193, Vancouver, Canada,. Association for Computational Linguistics.
- Cite (Informal):
- Noise Reduction Methods for Distantly Supervised Biomedical Relation Extraction (Li et al., BioNLP 2017)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W17-2323.pdf