Data Augmentation for Low-Resource Keyphrase Generation

Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea


Abstract
Keyphrase generation is the task of summarizing the contents of any given article into a few salient phrases (or keyphrases). Existing works for the task mostly rely on large-scale annotated datasets, which are not easy to acquire. Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations. In this paper, we present data augmentation strategies specifically to address keyphrase generation in purely resource-constrained domains. We design techniques that use the full text of the articles to improve both present and absent keyphrase generation. We test our approach comprehensively on three datasets and show that the data augmentation strategies consistently improve the state-of-the-art performance. We release our source code at https://github.com/kgarg8/kpgen-lowres-data-aug.
Anthology ID:
2023.findings-acl.534
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8442–8455
Language:
URL:
https://aclanthology.org/2023.findings-acl.534
DOI:
10.18653/v1/2023.findings-acl.534
Bibkey:
Cite (ACL):
Krishna Garg, Jishnu Ray Chowdhury, and Cornelia Caragea. 2023. Data Augmentation for Low-Resource Keyphrase Generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8442–8455, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for Low-Resource Keyphrase Generation (Garg et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.534.pdf