Abstract
The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets. However, the creation of large-scale datasets is often not feasible in closed domains, such as medical and healthcare domains, where human annotation requires domain expertise. This paper presents a novel data selection strategy to generate diverse and semantic questions in a low-resource setting with the aim to summarize healthcare questions. Our method exploits the concept of guided semantic-overlap and diversity-based objective functions to optimally select the informative and diverse set of synthetic samples for data augmentation. Our extensive experiments on benchmark healthcare question summarization datasets demonstrate the effectiveness of our proposed data selection strategy by achieving new state-of-the-art results. Our human evaluation shows that our method generates diverse, fluent, and informative summarized questions.- Anthology ID:
- 2022.coling-1.255
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2892–2905
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.255
- DOI:
- Cite (ACL):
- Shweta Yadav and Cornelia Caragea. 2022. Towards Summarizing Healthcare Questions in Low-Resource Setting. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2892–2905, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Towards Summarizing Healthcare Questions in Low-Resource Setting (Yadav & Caragea, COLING 2022)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2022.coling-1.255.pdf