S M Rafiuddin


2025

Transformer attention scales quadratically with sequence length O(n2), limiting long-context use. We propose Adaptive Retention, a probabilistic, layer-wise token selection mechanism that learns which representations to keep under a strict global budget M. Retention is modeled with Bernoulli gates trained via a Hard-Concrete/variational relaxation and enforced with a simple top-M rule at inference, making the method differentiable and drop-in for standard encoders. Across classification, extractive QA, and long-document summarization, keeping only 30–50% of tokens preserves ≥ 95% of full-model performance while cutting peak memory by ∼ 35–45% and improving throughput by up to ∼ 1.8×. This architecture-agnostic approach delivers practical long-context efficiency without modifying base attention or task heads.
Chain-of-Thought (CoT) prompting has emerged as a powerful empirical technique for eliciting multi-step reasoning from large language models by decomposing complex tasks into sequential subprompts. However, the formal computational trade-offs between internal computation, query count, and space usage remain unexplored. We introduce the CoT-oracle Turing machine, a formal model in which each subprompt corresponds to an oracle query, and define three resource metrics: internal time T(n), query complexity Q(n), and prompt buffer space Sprompt(n). We prove that (T,Q)-bounded CoT machines exactly capture the class PO[Q(n)] of polynomial-time Turing reductions with Q(n) queries, derive upper bounds for P and NP-complete problems under linear and prefix-query budgets, and establish an Ω(n) query lower bound for SAT under P ≠ NP. Illustrative examples on integer factorization and SAT reconstruction, together with synthetic and LLM-based simulations, confirm our theoretical T–Q–S trade-off predictions. This framework provides principled guidelines for prompt design, noisy-oracle robustness, and cost-aware reasoning.
The Political Compass Test (PCT) and similar surveys are commonly used to assess political bias in auto-regressive LLMs. Our rigorous statistical experiments show that while changes to standard generation parameters have minimal effect on PCT scores, prompt phrasing and fine-tuning individually and together can significantly influence results. Interestingly, fine-tuning on politically rich vs. neutral datasets does not lead to different shifts in scores. We also generalize these findings to a similar popular test called 8 Values. Humans do not change their responses to questions when prompted differently (“answer this question” vs “state your opinion”), or after exposure to politically neutral text, such as mathematical formulae. But the fact that the models do so raises concerns about the validity of these tests for measuring model bias, and paves the way for deeper exploration into how political and social views are encoded in LLMs.