Segmentation Matters: Exploring LLM-Based Strategies for Temporal Clinical Event Identification in Oncology Reports
Cristiano Bellucci, Francesco Madeddu, Chiara Iacomini, Carlotta Masciocchi, Stefano Patarnello, Massimo Bernaschi, Mario Santoro, Livia Lilli
Abstract
Processing unstructured clinical narratives remains a major challenge in medical Natural Language Processing (NLP), particularly when critical information is embedded within lengthy and heterogeneous reports. Clinical notes often describe key diagnostic and therapeutic events through a verbose narrative, making automatic event identification difficult. In this work, we frame the identification of clinical events as a text segmentation task.We conduct a comparative study of three segmentation strategies applied to oncology reports: (i) a fully regex-based approach, (ii) a cascaded regex?LLM pipeline, and (iii) the same cascade architecture augmented with a recovery mechanism to mitigate LLM rephrasing. Segmentation quality is evaluated using complementary structural metrics (Pk, WindowDiff, Boundary Similarity, Segment Count Accuracy, and Text Overlap IoU), and its impact is also observed on downstream segment tagging, performed to identify the corresponding event type (e.g. surgery, biopsy, imaging, treatment, laboratory).The results demonstrate the high potential of LLM-based approaches, particularly in preserving semantic coherence within segments and generalization on new data sources. However, regex-based segmentation achieves higher performance according to structural segmentation metrics, also leading to better downstream clinical event identification. In general, these results highlight the critical role of context-adaptive high-quality segmentation strategies in the structuring of verbose clinical narratives and in the accurate identification of key patient events.- Anthology ID:
- 2026.bionlp-1.47
- Volume:
- BioNLP 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 595–604
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.47/
- DOI:
- Cite (ACL):
- Cristiano Bellucci, Francesco Madeddu, Chiara Iacomini, Carlotta Masciocchi, Stefano Patarnello, Massimo Bernaschi, Mario Santoro, and Livia Lilli. 2026. Segmentation Matters: Exploring LLM-Based Strategies for Temporal Clinical Event Identification in Oncology Reports. In BioNLP 2026, pages 595–604, San Diego, California. Association for Computational Linguistics.
- Cite (Informal):
- Segmentation Matters: Exploring LLM-Based Strategies for Temporal Clinical Event Identification in Oncology Reports (Bellucci et al., BioNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.47.pdf