SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers

Santosh Tokala, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, Partha Pratim Das


Abstract
Keyphrases in a research paper succinctly capture the primary content of the paper and also assist in indexing the paper at a concept level. Given the huge rate at which scientific papers are published today, it is important to have effective ways of automatically extracting keyphrases from a research paper. In this paper, we present a novel method, Syntax and Semantics Aware Keyphrase Extraction (SaSAKE), to extract keyphrases from research papers. It uses a transformer architecture, stacking up sentence encoders to incorporate sequential information, and graph encoders to incorporate syntactic and semantic dependency graph information. Incorporation of these dependency graphs helps to alleviate long-range dependency problems and identify the boundaries of multi-word keyphrases effectively. Experimental results on three benchmark datasets show that our proposed method SaSAKE achieves state-of-the-art performance in keyphrase extraction from scientific papers.
Anthology ID:
2020.coling-main.469
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5372–5383
Language:
URL:
https://aclanthology.org/2020.coling-main.469
DOI:
10.18653/v1/2020.coling-main.469
Bibkey:
Cite (ACL):
Santosh Tokala, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, and Partha Pratim Das. 2020. SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5372–5383, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers (Tokala et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.469.pdf
Data
KP20k