Abstract
Systems which build on top of information extraction are typically challenged to extract knowledge that, while correct, is not yet well-known. We hypothesize that a good confidence measure for relational information has the property that such interesting information is found between information extracted with very high confidence and very low confidence. We discuss confidence estimation for the domain of biomedical protein-protein relation discovery in biomedical literature. As facts reported in papers take some time to be validated and recorded in biomedical databases, such task gives rise to large quantities of unknown but potentially true candidate relations. It is thus important to rank them based on supporting evidence rather than discard them. In this paper, we discuss this task and propose different approaches for confidence estimation and a pipeline to evaluate such methods. We show that the most straight-forward approach, a combination of different confidence measures from pipeline modules seems not to work well. We discuss this negative result and pinpoint potential future research directions.- Anthology ID:
- W17-8008
- Volume:
- Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
- Month:
- September
- Year:
- 2017
- Address:
- Varna, Bulgaria
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 55–63
- Language:
- URL:
- https://doi.org/10.26615/978-954-452-044-1_008
- DOI:
- 10.26615/978-954-452-044-1_008
- Cite (ACL):
- Camilo Thorne and Roman Klinger. 2017. Towards Confidence Estimation for Typed Protein-Protein Relation Extraction. In Proceedings of the Biomedical NLP Workshop associated with RANLP 2017, pages 55–63, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Towards Confidence Estimation for Typed Protein-Protein Relation Extraction (Thorne & Klinger, RANLP 2017)
- PDF:
- https://doi.org/10.26615/978-954-452-044-1_008