Akm Mahbubur Rahman

Also published as: AKM Mahbubur Rahman


2026

The growing scale of pre-trained language models poses a challenge in fine-tuning for downstream tasks, especially in resource-constrained settings. Recent studies highlight that not all layers in transformer-based language models contribute equally to downstream task performance, giving rise to various partial fine-tuning strategies. However, current methods often introduce significant training overhead or rely on simple heuristics that yield suboptimal performance and poor generalization. We propose PRiSM (Partial Ranking via inter-layer Semantic Measurement), a training-free approach for layer-wise partial fine-tuning that leverages the cosine similarity between pre-trained aggregate token representations across layers to identify inter-layer relationships. comprises two stages: (i) scoring layers based on their relevance to the task via a single forward pass, and (ii) fine-tuning a subset of block-wise highest-scoring layers, while keeping others frozen. We conduct experiments on 15 diverse NLP datasets, including single-sentence and sentence-pair classification tasks. Our method achieves competitive performance compared to full fine-tuning, with an average training speedup of 1.5× and a reduction of trainable parameters by 75%, and outperforms all the comparative baselines. Additionally, our approach does not cause any notable drop in performance when the domain is changed for the evaluation tasks, demonstrating robust cross-domain generalizability.

2025

We present Team BD’s submission to the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors, under Track 1 (Mistake Identification) and Track 2 (Mistake Location). Both tracks involve three-class classification of tutor responses in educational dialogues – determining if a tutor correctly recognizes a student’s mistake (Track 1) and whether the tutor pinpoints the mistake’s location (Track 2). Our system is built on MPNet, a Transformer-based language modelthat combines BERT and XLNet’s pre-training advantages. We fine-tuned MPNet on the task data using a class-weighted cross-entropy loss to handle class imbalance, and leveraged grouped cross-validation (10 folds) to maximize the use of limited data while avoiding dialogue overlap between training and validation. We then performed a hard-voting ensemble of the best models from each fold, which improves robustness and generalization by combining multiple classifiers. Ourapproach achieved strong results on both tracks, with exact-match macro-F1 scores of approximately 0.7110 for Mistake Identification and 0.5543 for Mistake Location on the official test set. We include comprehensive analysis of our system’s performance, including confusion matrices and t-SNE visualizations to interpret classifier behavior, as well as a taxonomy of common errors with examples. We hope our ensemble-based approach and findings provide useful insights for designing reliable tutor response evaluation systems in educational dialogue settings.

2023