BIOPSY - Biomarkers In Oncology: Pipeline for Structured Yielding

Sanya A. Chetwani, Jaseem Mahmmdla


Abstract
In clinical science, biomarkers are crucial indicators for early cancer detection, prognosis, and guiding personalized treatment decisions. Although critical, extracting biomarkers and their levels from clinical texts remains a complex and underexplored problem in natural language processing research. In this paper, we present BIOPSY, an end-to-end pipeline that integrates a domain-adapted biomarker entity recognition model, a relation extraction model to link biomarkers to their respective mutations, a biomarker-type classifier, and finally, a tailored algorithm to capture biomarker expression levels. Evaluated on 5,000 real-world clinical texts, our system achieved an overall F1 score of 0.86 for oncology and 0.87 for neuroscience domains. This reveals the ability of the pipeline to adapt across various clinical sources, including trial records, research papers, and medical notes, offering the first comprehensive solution for end-to-end, context-aware biomarker extraction and interpretation in clinical research.
Anthology ID:
2025.emnlp-industry.159
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2313–2321
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.159/
DOI:
Bibkey:
Cite (ACL):
Sanya A. Chetwani and Jaseem Mahmmdla. 2025. BIOPSY - Biomarkers In Oncology: Pipeline for Structured Yielding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2313–2321, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
BIOPSY - Biomarkers In Oncology: Pipeline for Structured Yielding (Chetwani & Mahmmdla, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.159.pdf