Hamed Rahimi
2025
How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora
Nacef Ben Mansour
|
Hamed Rahimi
|
Motasem Alrahabi
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities
Automatic keyword extraction from scientific articles is pivotal for organizing scholarly archives, powering semantic search engines, and mapping interdisciplinary research trends. However, existing methods—including statistical and graph-based approaches—struggle to handle domain-specific challenges such as technical terminology, cross-disciplinary ambiguity, and dynamic scientific jargon. This paper presents an empirical comparison of traditional keyword extraction methods (e.g. TextRank and YAKE) with approaches based on Large Language Model. We introduce a novel evaluation framework that combines fuzzy semantic matching based on Levenshtein Distance with exact-match metrics (F1, precision, recall) to address inconsistencies in keyword normalization across scientific corpora. Through an extensive ablation study across nine different LLMs, we analyze their performance and associated costs. Our findings reveal that LLM-based methods consistently achieve superior precision and relevance compared to traditional approaches. This performance advantage suggests significant potential for improving scientific search systems and information retrieval in academic contexts.
2024
Contextualized Topic Coherence Metrics
Hamed Rahimi
|
David Mimno
|
Jacob Hoover
|
Hubert Naacke
|
Camelia Constantin
|
Bernd Amann
Findings of the Association for Computational Linguistics: EACL 2024
This article proposes a new family of LLM-based topic coherence metrics called Contextualized Topic Coherence (CTC) and inspired by standard human topic evaluation methods. CTC metrics simulate human-centered coherence evaluation while maintaining the efficiency of other automated methods. We compare the performance of our CTC metrics and five other baseline metrics on seven topic models and show that CTC metrics better reflect human judgment, particularly for topics extracted from short text collections by avoiding highly scored topics that are meaningless to humans.
Search
Fix data
Co-authors
- Motasem Alrahabi 1
- Bernd Amann 1
- Camelia Constantin 1
- Nacef Ben Mansour 1
- David Mimno 1
- show all...