Mahbub E Sobhani
2025
BRACU_CL at BLP-2025 Task 2: CodeMist: A Transformer-Based Framework for Bangla Instruction-to-Code Generation
Md. Fahmid-Ul-Alam Juboraj
|
Soumik Deb Niloy
|
Mahbub E Sobhani
|
Farig Sadeque
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
This study proposes a hybrid framework for Bangla-to-Python code generation, emphasizing improved code accuracy through a two-phase pipeline: generation and debugging. During development, standalone models such as TigerLLM and StarCoder achieved modest accuracies of 27% and 24%, respectively, while more advanced models, Gemini-1.5-flash and Gemma, reached 60% and 64%. Integrating Gemma with the gpt-oss debugger substantially increased accuracy to 99.75%, highlighting the critical role of a dedicated debugging stage. In testing on unseen data, gpt-oss alone achieved 67%, which improved to 71% with self-debugging. The highest performance, 84%, was obtained by pairing Gemini-2.5-flash as the generator with gpt-oss for debugging. These findings demonstrate that combining a strong generative model with an effective debugging component yields superior and robust code generation results, outperforming existing approaches such as TigerLLM. The full implementation of the framework is publicly available at https://github.com/fahmid-juboraj/Code_generation.
2023
Advancing Bangla Punctuation Restoration by a Monolingual Transformer-Based Method and a Large-Scale Corpus
Mehedi Hasan Bijoy
|
Mir Fatema Afroz Faria
|
Mahbub E Sobhani
|
Tanzid Ferdoush
|
Swakkhar Shatabda
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
Punctuation restoration is the endeavor of reinstating and rectifying missing or improper punctuation marks within a text, thereby eradicating ambiguity in written discourse. The Bangla punctuation restoration task has received little attention and exploration, despitethe rising popularity of textual communication in the language. The primary hindrances in the advancement of the task revolve aroundthe utilization of transformer-based methods and an openly accessible extensive corpus, challenges that we discovered remainedunresolved in earlier efforts. In this study, we propose a baseline by introducing a mono-lingual transformer-based method named Jatikarok, where the effectiveness of transfer learning has been meticulously scrutinized, and a large-scale corpus containing 1.48M source-target pairs to resolve the previous issues. The Jatikarok attains accuracy rates of 95.2%, 85.13%, and 91.36% on the BanglaPRCorpus, Prothom-Alo Balanced, and BanglaOPUS corpora, thereby establishing itself as the state-of-the-art method through its superior performance compared to BanglaT5 and T5-Small. Jatikarok and BanglaPRCorpus are publicly available at: https://github.com/mehedihasanbijoy/Jatikarok-and-BanglaPRCorpus