Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts

Florian Boudin, Akiko Aizawa


Abstract
Adapting keyphrase generation models to new domains typically involves few-shot fine-tuning with in-domain labeled data. However, annotating documents with keyphrases is often prohibitively expensive and impractical, requiring expert annotators. This paper presents silk, an unsupervised method designed to address this issue by extracting silver-standard keyphrases from citation contexts to create synthetic labeled data for domain adaptation. Extensive experiments across three distinct domains demonstrate that our method yields high-quality synthetic samples, resulting in significant and consistent improvements in in-domain performance over strong baselines.
Anthology ID:
2024.findings-emnlp.33
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
598–614
Language:
URL:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.33/
DOI:
10.18653/v1/2024.findings-emnlp.33
Bibkey:
Cite (ACL):
Florian Boudin and Akiko Aizawa. 2024. Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 598–614, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts (Boudin & Aizawa, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.33.pdf