Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings

Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann


Abstract
Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.
Anthology ID:
N18-2100
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
634–639
Language:
URL:
https://aclanthology.org/N18-2100
DOI:
10.18653/v1/N18-2100
Bibkey:
Cite (ACL):
Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 634–639, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings (Mahata et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/N18-2100.pdf
Note:
 N18-2100.Notes.pdf