Shuhaib Mehri


2025

pdf bib
Discourse Relation Recognition with Language Models Under Different Data Availability
Shuhaib Mehri | Chuyuan Li | Giuseppe Carenini
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Large Language Models (LLMs) have demonstrated remarkable performance across various NLP tasks, yet they continue to face challenges in discourse relation recognition (DRR). Current state-of-the-art methods for DRR primarily rely on smaller pre-trained language models (PLMs). In this study, we conduct a comprehensive analysis of different approaches using both PLMs and LLMs, evaluating their effectiveness for DRR at multiple granularities and under different data availability settings. Our findings indicate that no single approach consistently outperforms the others, and we offer a general comparison framework to guide the selection of the most appropriate model based on specific DRR requirements and data conditions.

2023

pdf bib
Automatic Evaluation of Generative Models with Instruction Tuning
Shuhaib Mehri | Vered Shwartz
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Automatic evaluation of natural language generation has long been an elusive goal in NLP. A recent paradigm fine-tunes pre-trained language models to emulate human judgements for a particular task and evaluation criterion. Inspired by the generalization ability of instruction-tuned models, we propose a learned metric based on instruction tuning. To test our approach, we collected HEAP, a dataset of human judgements across various NLG tasks and evaluation criteria. Our findings demonstrate that instruction tuning language models on HEAP yields good performance on many evaluation tasks, though some criteria are less trivial to learn than others. Further, jointly training on multiple tasks can yield additional performance improvements, which can be beneficial for future tasks with little to no human annotated data.