Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders
Saeed Abbasi, Aijun An, Heidar Davoudi, Ron Di Carlantonio, Gary Farmaner
Abstract
We introduce a novel Transformer-based method for document segmentation, tailored for practical, real-world applications. This method utilizes overlapping text sequences with a unique position-aware weighting mechanism to enhance segmentation accuracy. Through comprehensive experiments on both public and proprietary datasets, we demonstrate significant improvements, establishing new state-of-the-art standards by achieving up to a 10% increase in segmentation F1 score compared to existing methods. Additionally, we explore the application of our segmentation method in downstream retrieval-augmented question answering tasks, where it improves the quality of generated responses by 5% while achieving up to four times greater efficiency. These results underscore our model’s potential as a robust and scalable solution for real-world text segmentation challenges.- Anthology ID:
- 2025.coling-industry.67
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 807–816
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.coling-industry.67/
- DOI:
- Cite (ACL):
- Saeed Abbasi, Aijun An, Heidar Davoudi, Ron Di Carlantonio, and Gary Farmaner. 2025. Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 807–816, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders (Abbasi et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.coling-industry.67.pdf