Mehrab Mustafy Rahman


2025

pdf bib
Overview of BLP-2025 Task 2: Code Generation in Bangla
Nishat Raihan | Mohammad Anas Jawad | Md Mezbaur Rahman | Noshin Ulfat | Pranav Gupta | Mehrab Mustafy Rahman | Santu Karmaker | Marcos Zampieri
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)

This paper presents an overview of the BLP 2025 shared task Code Generation in Bangla, organized with the BLP workshop co-located with AACL. The task evaluates Generative AI systems capable of generating executable Python code from natural language prompts written in Bangla. This is the first shared task to address Bangla code generation. It attracted 152 participants across 63 teams, yielding 488 submissions, with 15 system-description papers. Participating teams employed both proprietary and open-source LLMs, with prevalent strategies including prompt engineering, fine-tuning, and machine translation. The top Pass@1 reached 0.99 on the development phase and 0.95 on the test phase. In this report, we detail the task design, data, and evaluation protocol, and synthesize methodological trends observed across submissions. Notably, we observe that the high performance is not based on single models; rather, a pipeline of multiple AI tools and/or methods.

2022

pdf bib
BanglaRQA: A Benchmark Dataset for Under-resourced Bangla Language Reading Comprehension-based Question Answering with Diverse Question-Answer Types
Syed Mohammed Sartaj Ekram | Adham Arik Rahman | Md. Sajid Altaf | Mohammed Saidul Islam | Mehrab Mustafy Rahman | Md Mezbaur Rahman | Md Azam Hossain | Abu Raihan Mostofa Kamal
Findings of the Association for Computational Linguistics: EMNLP 2022

High-resource languages, such as English, have access to a plethora of datasets with various question-answer types resembling real-world reading comprehension. However, there is a severe lack of diverse and comprehensive question-answering datasets in under-resourced languages like Bangla. The ones available are either translated versions of English datasets with a niche answer format or created by human annotations focusing on a specific domain, question type, or answer type. To address these limitations, this paper introduces BanglaRQA, a reading comprehension-based Bangla question-answering dataset with various question-answer types. BanglaRQA consists of 3,000 context passages and 14,889 question-answer pairs created from those passages. The dataset comprises answerable and unanswerable questions covering four unique categories of questions and three types of answers. In addition, this paper also implemented four different Transformer models for question-answering on the proposed dataset. The best-performing model achieved an overall 62.42% EM and 78.11% F1 score. However, detailed analyses showed that the performance varies across question-answer types, leaving room for substantial improvement of the model performance. Furthermore, we demonstrated the effectiveness of BanglaRQA as a training resource by showing strong results on the bn_squad dataset. Therefore, BanglaRQA has the potential to contribute to the advancement of future research by enhancing the capability of language models. The dataset and codes are available at https://github.com/sartajekram419/BanglaRQA