How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora

Nacef Ben Mansour; Hamed Rahimi; Motasem Alrahabi

How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora

Nacef Ben Mansour, Hamed Rahimi, Motasem Alrahabi

Abstract

Automatic keyword extraction from scientific articles is pivotal for organizing scholarly archives, powering semantic search engines, and mapping interdisciplinary research trends. However, existing methods—including statistical and graph-based approaches—struggle to handle domain-specific challenges such as technical terminology, cross-disciplinary ambiguity, and dynamic scientific jargon. This paper presents an empirical comparison of traditional keyword extraction methods (e.g. TextRank and YAKE) with approaches based on Large Language Model. We introduce a novel evaluation framework that combines fuzzy semantic matching based on Levenshtein Distance with exact-match metrics (F1, precision, recall) to address inconsistencies in keyword normalization across scientific corpora. Through an extensive ablation study across nine different LLMs, we analyze their performance and associated costs. Our findings reveal that LLM-based methods consistently achieve superior precision and relevance compared to traditional approaches. This performance advantage suggests significant potential for improving scientific search systems and information retrieval in academic contexts.

Anthology ID:: 2025.aisd-main.2
Volume:: Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Peter Jansen, Bhavana Dalvi Mishra, Harsh Trivedi, Bodhisattwa Prasad Majumder, Tom Hope, Tushar Khot, Doug Downey, Eric Horvitz
Venues:: AISD | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13–21
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.aisd-main.2/
DOI:
Bibkey:
Cite (ACL):: Nacef Ben Mansour, Hamed Rahimi, and Motasem Alrahabi. 2025. How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora. In Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, pages 13–21, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora (Mansour et al., AISD 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.aisd-main.2.pdf

PDF Cite Search Fix data