Abstract
Sequence-to-sequence models have lead to significant progress in keyphrase generation, but it remains unknown whether they are reliable enough to be beneficial for document retrieval. This study provides empirical evidence that such models can significantly improve retrieval performance, and introduces a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models. Using this framework, we point out and discuss the difficulties encountered with supplementing documents with -not present in text- keyphrases, and generalizing models across domains. Our code is available at https://github.com/boudinfl/ir-using-kg- Anthology ID:
- 2020.acl-main.105
- Volume:
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1118–1126
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2020.acl-main.105/
- DOI:
- 10.18653/v1/2020.acl-main.105
- Cite (ACL):
- Florian Boudin, Ygor Gallina, and Akiko Aizawa. 2020. Keyphrase Generation for Scientific Document Retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1118–1126, Online. Association for Computational Linguistics.
- Cite (Informal):
- Keyphrase Generation for Scientific Document Retrieval (Boudin et al., ACL 2020)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2020.acl-main.105.pdf
- Code
- boudinfl/ir-using-kg
- Data
- KP20k