Dealing with Data Scarcity in Spoken Question Answering

Merve Ünlü Menevşe, Yusufcan Manav, Ebru Arisoy, Arzucan Özgür


Abstract
This paper focuses on dealing with data scarcity in spoken question answering (QA) using automatic question-answer generation and a carefully selected fine-tuning strategy that leverages limited annotated data (paragraphs and question-answer pairs). Spoken QA is a challenging task due to using spoken documents, i.e., erroneous automatic speech recognition (ASR) transcriptions, and the scarcity of spoken QA data. We propose a framework for utilizing limited annotated data effectively to improve spoken QA performance. To deal with data scarcity, we train a question-answer generation model with annotated data and then produce large amounts of question-answer pairs from unannotated data (paragraphs). Our experiments demonstrate that incorporating limited annotated data and the automatically generated data through a carefully selected fine-tuning strategy leads to 5.5% relative F1 gain over the model trained only with annotated data. Moreover, the proposed framework is also effective in high ASR errors.
Anthology ID:
2024.lrec-main.397
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4449–4455
Language:
URL:
https://aclanthology.org/2024.lrec-main.397
DOI:
Bibkey:
Cite (ACL):
Merve Ünlü Menevşe, Yusufcan Manav, Ebru Arisoy, and Arzucan Özgür. 2024. Dealing with Data Scarcity in Spoken Question Answering. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4449–4455, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Dealing with Data Scarcity in Spoken Question Answering (Ünlü Menevşe et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.397.pdf