Abstract
This paper details a Consumer Health Question (CHQ) summarization model submitted to MEDIQA 2021 for shared task 1: Question Summarization. Many CHQs are composed of multiple sentences with typos or unnecessary information, which can interfere with automated question answering systems. Question summarization mitigates this issue by removing this unnecessary information, aiding automated systems in generating a more accurate summary. Our summarization approach focuses on applying multiple pre-processing techniques, including question focus identification on the input and the development of an ensemble method to combine question focus with an abstractive summarization method. We use the state-of-art abstractive summarization model, PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization), to generate abstractive summaries. Our experiments show that using our ensemble method, which combines abstractive summarization with question focus identification, improves performance over using summarization alone. Our model shows a ROUGE-2 F-measure of 11.14% against the official test dataset.- Anthology ID:
- 2021.bionlp-1.37
- Volume:
- Proceedings of the 20th Workshop on Biomedical Language Processing
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 320–327
- Language:
- URL:
- https://aclanthology.org/2021.bionlp-1.37
- DOI:
- 10.18653/v1/2021.bionlp-1.37
- Cite (ACL):
- Jooyeon Lee, Huong Dang, Ozlem Uzuner, and Sam Henry. 2021. MNLP at MEDIQA 2021: Fine-Tuning PEGASUS for Consumer Health Question Summarization. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 320–327, Online. Association for Computational Linguistics.
- Cite (Informal):
- MNLP at MEDIQA 2021: Fine-Tuning PEGASUS for Consumer Health Question Summarization (Lee et al., BioNLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.bionlp-1.37.pdf
- Data
- MeQSum