Mihir Panchal


2026

Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically rich, low resource languages. Our results provide crucial insights into the layer-wise semantic encoding of multilingual transformers. Our model is available at https://huggingface.co/spaces/MihirRajeshPanchal/IndicTunedLens. Our code is available at https://github.com/MihirRajeshPanchal/IndicTunedLens.

2025

Large language models (LLMs) excel across diverse natural language processing tasks but remain opaque and unreliable. This thesis investigates how LLM reasoning can be made both interpretable and reliable through systematic analysis of internal dynamics and targeted interventions. Unlike prior work that examines reasoning broadly, this research focuses on two representative domains: puzzle solving, where reasoning steps can be precisely tracked, and ontological inference, where hierarchical structures constrain valid reasoning. The central questions are: (1) How can systematic error patterns in domain specific reasoning be detected through layer wise probing and mitigated through targeted interventions? (2) How can probing frameworks and middle layer analyses reveal and enhance the computational mechanisms underlying inference? By combining probing methods, middle layer investigations, and probe guided interventions, the work aims to uncover interpretable reasoning patterns, identify systematic failure modes, and develop adaptive enhancement strategies. The expected outcome is a domain grounded framework that advances both theoretical understanding of neural reasoning and the design of practical, trustworthy AI systems.