S M Rafiuddin

Also published as: Rifat Rafiuddin

2026

Context-Conditioned Masked LoRA: Dynamic Rank Routing for Compute-Efficient Parameter-Efficient Fine-Tuning
Rifat Rafiuddin | Rafae Abdullah
Findings of the Association for Computational Linguistics: ACL 2026

Parameter-efficient fine-tuning methods such as LoRA reduce trainable parameters, but still apply dense low-rank updates per token, leaving adaptation compute largely fixed once rank is set. We propose Context-Conditioned Masked LoRA (CCM-LoRA), which learns a lightweight router that activates an input-dependent subset of LoRA rank directions, turning LoRA into dynamic rank routing and enabling contextual sparsity in fine-tuning and inference. CCM-LoRA is trained with a budget-constrained objective that targets an expected effective rank (or FLOPs) while regularizing routing to avoid degenerate always-on/off masks. Across public NLU and multilingual benchmarks, CCM-LoRA improves the accuracy–efficiency Pareto frontier versus static-rank LoRA and adaptive-rank baselines, matching or improving task performance at lower inference-time effective rank. We also provide a reproducible profiling protocol and analyses of rank usage, router overhead, and robustness under domain and language shift.

pdf bib

MaskLoRA: Low-Rank Subspace–Induced Token Masking for Efficient and Faithful Language Models
Rifat Rafiuddin
Findings of the Association for Computational Linguistics: EACL 2026

2025

pdf bib abs

A Detailed Factor Analysis for the Political Compass Test: Navigating Ideologies of Large Language Models
Sadia Kamal | Lalu Prasad Yadav Prakash | S M Rafiuddin | Mohammed Rakib | Atriya Sen | Sagnik Ray Choudhury
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

The Political Compass Test (PCT) and similar surveys are commonly used to assess political bias in auto-regressive LLMs. Our rigorous statistical experiments show that while changes to standard generation parameters have minimal effect on PCT scores, prompt phrasing and fine-tuning individually and together can significantly influence results. Interestingly, fine-tuning on politically rich vs. neutral datasets does not lead to different shifts in scores. We also generalize these findings to a similar popular test called 8 Values. Humans do not change their responses to questions when prompted differently (“answer this question” vs “state your opinion”), or after exposure to politically neutral text, such as mathematical formulae. But the fact that the models do so raises concerns about the validity of these tests for measuring model bias, and paves the way for deeper exploration into how political and social views are encoded in LLMs.

pdf bib abs

A Formal Analysis of Chain-of-Thought Prompting via Turing Reductions
S M Rafiuddin | Muntaha Nujat Khan
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Chain-of-Thought (CoT) prompting has emerged as a powerful empirical technique for eliciting multi-step reasoning from large language models by decomposing complex tasks into sequential subprompts. However, the formal computational trade-offs between internal computation, query count, and space usage remain unexplored. We introduce the CoT-oracle Turing machine, a formal model in which each subprompt corresponds to an oracle query, and define three resource metrics: internal time T(n), query complexity Q(n), and prompt buffer space Sprompt(n). We prove that (T,Q)-bounded CoT machines exactly capture the class PO[Q(n)] of polynomial-time Turing reductions with Q(n) queries, derive upper bounds for P and NP-complete problems under linear and prefix-query budgets, and establish an Ω(n) query lower bound for SAT under P ≠ NP. Illustrative examples on integer factorization and SAT reconstruction, together with synthetic and LLM-based simulations, confirm our theoretical T–Q–S trade-off predictions. This framework provides principled guidelines for prompt design, noisy-oracle robustness, and cost-aware reasoning.

pdf bib abs

Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models
S M Rafiuddin | Muntaha Nujat Khan
Findings of the Association for Computational Linguistics: EMNLP 2025

Transformer attention scales quadratically with sequence length O(n²), limiting long-context use. We propose Adaptive Retention, a probabilistic, layer-wise token selection mechanism that learns which representations to keep under a strict global budget M. Retention is modeled with Bernoulli gates trained via a Hard-Concrete/variational relaxation and enforced with a simple top-M rule at inference, making the method differentiable and drop-in for standard encoders. Across classification, extractive QA, and long-document summarization, keeping only 30–50% of tokens preserves ≥ 95% of full-model performance while cutting peak memory by ∼ 35–45% and improving throughput by up to ∼ 1.8×. This architecture-agnostic approach delivers practical long-context efficiency without modifying base attention or task heads.

Co-authors

Mohammed Rakib 1

Atriya Sen 1

Venues

Fix author