Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator

Gyu-Min Park, Seong-Eun Hong, Seong-Bae Park


Abstract
The pre-trained language models such as KoBART often fail in generating perfect interrogative sentences when they are applied to Korean question generation. This is mainly due to the fact that the language models are much experienced with declarative sentences, but not with interrogative sentences. Therefore, this paper proposes a novel post-training of KoBART to enhance it for Korean question generation. The enhancement of KoBART is accomplished in three ways: (i) introduction of question infilling objective to KoBART to enforce it to focus more on the structure of interrogative sentences, (ii) augmentation of training data for question generation with another data set to cope with the lack of training instances for post-training, (iii) introduction of Korean spacing objective to make KoBART understand the linguistic features of Korean. Since there is no standard data set for Korean question generation, this paper also proposes KorQuAD-QG, a new data set for this task, to verify the performance of the proposed post-training. Our code are publicly available at https://github.com/gminipark/post_training_qg
Anthology ID:
2022.aacl-short.26
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2022
Address:
Online only
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
202–209
Language:
URL:
https://aclanthology.org/2022.aacl-short.26
DOI:
Bibkey:
Cite (ACL):
Gyu-Min Park, Seong-Eun Hong, and Seong-Bae Park. 2022. Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 202–209, Online only. Association for Computational Linguistics.
Cite (Informal):
Post-Training with Interrogative Sentences for Enhancing BART-based Korean Question Generator (Park et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.aacl-short.26.pdf
Software:
 2022.aacl-short.26.Software.zip