CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models
Potsawee Manakul, Yassir Fathullah, Adian Liusie, Vyas Raina, Vatsal Raina, Mark Gales
Abstract
In this paper, we consider the challenge of summarizing patients medical progress notes in a limited data setting. For the Problem List Summarization (shared task 1A) at the BioNLP Workshop 2023, we demonstrate that ClinicalT5 fine-tuned to 765 medical clinic notes outperforms other extractive, abstractive and zero-shot baselines, yielding reasonable baseline systems for medical note summarization. Further, we introduce Hierarchical Ensemble of Summarization Models (HESM), consisting of token-level ensembles of diverse fine-tuned ClinicalT5 models, followed by Minimum Bayes Risk (MBR) decoding. Our HESM approach lead to a considerable summarization performance boost, and when evaluated on held-out challenge data achieved a ROUGE-L of 32.77, which was the best-performing system at the top of the shared task leaderboard.- Anthology ID:
- 2023.bionlp-1.51
- Volume:
- The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
- Venue:
- BioNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 516–523
- Language:
- URL:
- https://aclanthology.org/2023.bionlp-1.51
- DOI:
- 10.18653/v1/2023.bionlp-1.51
- Cite (ACL):
- Potsawee Manakul, Yassir Fathullah, Adian Liusie, Vyas Raina, Vatsal Raina, and Mark Gales. 2023. CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 516–523, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models (Manakul et al., BioNLP 2023)
- PDF:
- https://preview.aclanthology.org/bionlp-24-ingestion/2023.bionlp-1.51.pdf