Rabin Adhikari
2026
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
Rabin Adhikari
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Rabin Adhikari
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Mechanistic interpretability aims to reverse-engineer large language models (LLMs) into human-understandable computational circuits. However, the complexity of pretrained models often obscures the minimal mechanisms required for specific reasoning tasks. In this work, we train small, attention-only transformers from scratch on a symbolic version of the Indirect Object Identification (IOI) task, a benchmark for studying coreference-like reasoning in transformers. Surprisingly, a single-layer model with only two attention heads achieves perfect IOI accuracy, despite lacking MLPs and normalization layers. Through residual stream decomposition, spectral analysis, and embedding interventions, we find that the two heads specialize into additive and contrastive subcircuits that jointly implement IOI resolution. Furthermore, we show that a two-layer, one-head model achieves performance comparable to that of a multi-head model by composing information across layers primarily through query-key interactions. These results demonstrate that task-specific training induces highly interpretable, minimal circuits, offering a controlled testbed for probing the computational foundations of transformer reasoning.
2022
COVID-19-related Nepali Tweets Classification in a Low Resource Setting
Rabin Adhikari | Safal Thapaliya | Nirajan Basnet | Samip Poudel | Aman Shakya | Bishesh Khanal
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Rabin Adhikari | Safal Thapaliya | Nirajan Basnet | Samip Poudel | Aman Shakya | Bishesh Khanal
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets to various topics. However, these tools that help combat the pandemic are limited to very few languages, making several countries unable to take their benefit. While multi-lingual or low-resource language-specific tools are being developed, there is still a need to expand their coverage, such as for the Nepali language. In this paper, we identify the eight most common COVID-19 discussion topics among the Twitter community using the Nepali language, set up an online platform to automatically gather Nepali tweets containing the COVID-19-related keywords, classify the tweets into the eight topics, and visualize the results across the period in a web-based dashboard. We compare the performance of two state-of-the-art multi-lingual language models for Nepali tweet classification, one generic (mBERT) and the other Nepali language family-specific model (MuRIL). Our results show that the models’ relative performance depends on the data size, with MuRIL doing better for a larger dataset. The annotated data, models, and the web-based dashboard are open-sourced at https://github.com/naamiinepal/covid-tweet-classification.