Ibrahim Al Azher

Also published as: Ibrahim Al Azher

2025

pdf bib abs
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text
Ibrahim Al Azher | Miftahul Jannat Mokarrama | Zhishuai Guo | Sagnik Ray Choudhury | Hamed Alhoori
Findings of the Association for Computational Linguistics: EMNLP 2025

In scientific research, “limitations” refer to the shortcomings, constraints, or weaknesses of a study. A transparent reporting of such limitations can enhance the quality and reproducibility of research and improve public trust in science. However, authors often underreport limitations in their papers and rely on hedging strategies to meet editorial requirements at the expense of readers’ clarity and confidence. This tendency, combined with the surge in scientific publications, has created a pressing need for automated approaches to extract and generate limitations from scholarly papers. To address this need, we present a full architecture for computational analysis of research limitations. Specifically, we (1) create a dataset of limitations from ACL, NeurIPS, and PeerJ papers by extracting them from the text and supplementing them with external reviews; (2) we propose methods to automatically generate limitations using a novel Retrieval Augmented Generation (RAG) technique; (3) we design a fine-grained evaluation framework for generated limitations, along with a meta-evaluation of these techniques. Code and datasets are available at: Code: https://github.com/IbrahimAlAzhar/BAGELS_Limitation_GenDataset: https://huggingface.co/datasets/IbrahimAlAzhar/limitation-generation-dataset-bagels

pdf bib abs
Predicting The Scholarly Impact of Research Papers Using Retrieval-Augmented LLMs
Tamjid Azad | Ibrahim Al Azher | Sagnik Ray Choudhury | Hamed Alhoori
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)

Assessing a research paper’s scholarly impact is an important phase in the scientific research process; however, metrics typically take some time after publication to accurately capture the impact. Our study examines how Large Language Models (LLMs) can predict scholarly impact accurately. We utilize Retrieval-Augmented Generation (RAG) to examine the degree to which the LLM performance improves compared to zero-shot prompting. Results show that LLama3-8b with RAG achieved the best overall performance, while Gemma-7b benefited the most from RAG, exhibiting the most significant reduction in Mean Absolute Error (MAE). Our findings suggest that retrieval-augmented LLMs offer a promising approach for early research evaluation. Our code and dataset for this project are publicly available.

Co-authors

Venues

Fix author