Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction

Mingyang Song, Liping Jing, Yi Feng


Abstract
Keyphrase extraction aims to automatically extract salient phrases representing the critical information in the source document. Identifying salient phrases is challenging because there is a lot of noisy information in the document, leading to wrong extraction. To address this issue, in this paper, we propose a hybrid matching model for keyphrase extraction, which combines representation-focused and interaction-based matching modules into a unified framework for improving the performance of the keyphrase extraction task. Specifically, HybridMatch comprises (1) a PLM-based Siamese encoder component that represents both candidate phrases and documents, (2) an interaction-focused matching (IM) component that estimates word matches between candidate phrases and the corresponding document at the word level, and (3) a representation-focused matching (RM) component captures context-aware semantic relatedness of each candidate keyphrase at the phrase level. Extensive experimental results on the OpenKP dataset demonstrate that the performance of the proposed model HybridMatch outperforms the recent state-of-the-art keyphrase extraction baselines. Furthermore, we discuss the performance of large language models in keyphrase extraction based on recent studies and our experiments.
Anthology ID:
2024.findings-acl.2
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–27
Language:
URL:
https://aclanthology.org/2024.findings-acl.2
DOI:
Bibkey:
Cite (ACL):
Mingyang Song, Liping Jing, and Yi Feng. 2024. Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction. In Findings of the Association for Computational Linguistics ACL 2024, pages 17–27, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction (Song et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.findings-acl.2.pdf