Abstract
PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature. Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR), which is already labor-intensive. A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems. The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community. To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ‘Intervention’ and ‘Comparator’ entity annotations. We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2% F1 improvement over the Intervention entity of the PICO benchmark and more than 5% improvement when combined with the manually annotated dataset. We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark. The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations.- Anthology ID:
- 2022.bionlp-1.34
- Volume:
- Proceedings of the 21st Workshop on Biomedical Language Processing
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 345–358
- Language:
- URL:
- https://aclanthology.org/2022.bionlp-1.34
- DOI:
- 10.18653/v1/2022.bionlp-1.34
- Cite (ACL):
- Anjani Dhrangadhariya and Henning Müller. 2022. DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 345–358, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature (Dhrangadhariya & Müller, BioNLP 2022)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2022.bionlp-1.34.pdf
- Code
- anjani-dhrangadhariya/distant-cto