Hamed Alhoori
2025
Predicting The Scholarly Impact of Research Papers Using Retrieval-Augmented LLMs
Tamjid Azad
|
Ibrahim Al Azher
|
Sagnik Ray Choudhury
|
Hamed Alhoori
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
Assessing a research paper’s scholarly impact is an important phase in the scientific research process; however, metrics typically take some time after publication to accurately capture the impact. Our study examines how Large Language Models (LLMs) can predict scholarly impact accurately. We utilize Retrieval-Augmented Generation (RAG) to examine the degree to which the LLM performance improves compared to zero-shot prompting. Results show that LLama3-8b with RAG achieved the best overall performance, while Gemma-7b benefited the most from RAG, exhibiting the most significant reduction in Mean Absolute Error (MAE). Our findings suggest that retrieval-augmented LLMs offer a promising approach for early research evaluation. Our code and dataset for this project are publicly available.
2022
Reproducibility Signals in Science: A preliminary analysis
Akhil Pandey Akella
|
Hamed Alhoori
|
David Koop
Proceedings of the first Workshop on Information Extraction from Scientific Publications
Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features in scholarly articles published in computer science conferences and journals and test how they correlate with reproducibility. We collected data from three different sources that labeled publications as either reproducible or irreproducible and employed statistical significance tests to identify features of those publications that hold clues about reproducibility. We found the readability of the scholarly article and accessibility of the software artifacts through hyperlinks to be strong signals noticeable amongst reproducible scholarly articles.