Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation
Hunter Heidenreich, Ratish Dalvi, Nikhil Verma, Yosheb Getachew
Abstract
Page Stream Segmentation (PSS) is critical for automating document processing in industries like insurance, where unstructured document collections are common. This paper explores the use of large language models (LLMs) for PSS, applying parameter-efficient fine-tuning to real-world insurance data. Our experiments show that LLMs outperform baseline models in page- and stream-level segmentation accuracy. However, stream-level calibration remains challenging, especially for high-stakes applications. We evaluate post-hoc calibration and Monte Carlo dropout, finding limited improvement. Future work will integrate active learning to enhance model calibration and support deployment in practical settings.- Anthology ID:
- 2025.coling-industry.26
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 305–317
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2025.coling-industry.26/
- DOI:
- Cite (ACL):
- Hunter Heidenreich, Ratish Dalvi, Nikhil Verma, and Yosheb Getachew. 2025. Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 305–317, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation (Heidenreich et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2025.coling-industry.26.pdf