Representation Learning for Resource-Constrained Keyphrase Generation

Di Wu; Wasi Ahmad; Sunipa Dev; Kai-Wei Chang

Representation Learning for Resource-Constrained Keyphrase Generation

Di Wu, Wasi Ahmad, Sunipa Dev, Kai-Wei Chang

Abstract

State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data. To overcome this challenge, we design a data-oriented approach that first identifies salient information using retrieval-based corpus-level statistics, and then learns a task-specific intermediate representation based on a pre-trained language model using large-scale unlabeled documents. We introduce salient span recovery and salient span prediction as denoising training objectives that condense the intra-article and inter-article knowledge essential for keyphrase generation. Through experiments on multiple keyphrase generation benchmarks, we show the effectiveness of the proposed approach for facilitating low-resource keyphrase generation and zero-shot domain adaptation. Our method especially benefits the generation of absent keyphrases, approaching the performance of models trained with large training sets.

Anthology ID:: 2022.findings-emnlp.49
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 700–716
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.49
DOI:
Bibkey:
Cite (ACL):: Di Wu, Wasi Ahmad, Sunipa Dev, and Kai-Wei Chang. 2022. Representation Learning for Resource-Constrained Keyphrase Generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 700–716, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Representation Learning for Resource-Constrained Keyphrase Generation (Wu et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.findings-emnlp.49.pdf

PDF Search