Fuad Rahman


2025

This paper describes our participation in Task 1A and Task 1B of the Task 1A and Task 1B of the BLP Workshop, focused on Bangla Multi-task Hatespeech Identification. Our approach involves systematic evaluation of four transformer models: BanglaBERT, XLM-RoBERTa, IndicBERT, and Bengali Abusive MuRIL. To enhance performance, we implemented an ensemble strategy that averages output probabilities from these transformer models, which consistently outperformed individual models across both tasks. The baseline classical methods demonstrated limitations in capturing complex linguistic cues, underscoring the superiority of transformer-based approaches for low-resource hate speech detection. Our solution initially achieved F1 scores of 0.7235 (ranked 12th) for Task 1A and 0.6981 (ranked 17th) for Task 1B among participating teams. Through post-competition refinements, we improved our Task 1B performance to 0.7331, demonstrating the effectiveness of ensemble methods in Bangla hate speech detection.
Large Language Models (LLMs) excel at complexreasoning tasks, yet their performance hinges on the quality of their prompts and pipeline structures. Manual promptdesign, as used in frameworks like DSPy, poses significantlimitations: it is time-intensive, demands substantial expertise,and lacks scalability, restricting the widespread use of LLMsacross diverse applications. To overcome these challenges, weintroduce AutoDSPy, the first framework to fully automateDSPy pipeline construction using reinforcement learning (RL).AutoDSPy leverages an RL-tuned policy network to dynamicallyselect optimal reasoning modules—such as Chain-of-Thought forlogical tasks or ReAct for tool integration—along with inputoutput signatures and execution strategies, entirely eliminatingthe need for manual configuration. Experimental results on theGSM8K and HotPotQA benchmarks demonstrate that AutoDSPyoutperforms traditional DSPy baselines, achieving accuracy gainsof up to 4.3% while reducing inference time, even with smallermodels like GPT-2 (127M). By integrating RL-based automation,AutoDSPy enhances both efficiency and accessibility, simplifyingthe development of structured, high-performing LLM solutionsand enabling scalability across a wide range of tasks

2023

This research paper focuses on developing an improved Bangla Optical Character Recognition (OCR) system, addressing the challenges posed by the complexity of Bangla text structure, diverse handwriting styles, and the scarcity of comprehensive datasets. Leveraging recent advancements in Deep Learning and OCR techniques, we anticipate a significant enhancement in the performance of Bangla OCR by utilizing a large and diverse collection of labeled Bangla text image datasets. This study introduces the most extensive gold standard corpus for Bangla characters and words, comprising over 4 million human-annotated images. Our dataset encompasses various document types, such as Computer Compose, Letterpress, Typewriters, Outdoor Banner-Poster, and Handwritten documents, gathered from diverse sources. The entire corpus has undergone meticulous human annotation, employing a controlled annotation procedure consisting of three-step annotation and one-step validation, ensuring adherence to gold standard criteria. This paper provides a comprehensive overview of the complete data collection procedure. The ICT Division, Government of the People’s Republic of Bangladesh, will make the dataset publicly available, facilitating further research and development in Bangla OCR and related domains.

2002