Abhinav P M

Also published as: Abhinav PM

2026

The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs Using Indian Riddles
Abhinav P M | Ojasva Saxena | Oswald C | Parameswari Krishnamurthy
Proceedings of the Fifteenth Language Resources and Evaluation Conference

The extent to which large language models (LLMs) can perform culturally grounded reasoning across non-English languages remains underexplored. This paper examines the reasoning and self-assessment abilities of LLMs across seven major Indian languages- Bengali, Gujarati, Hindi, Kannada, Malayalam, Tamil, and Telugu. We introduce a multilingual riddle dataset combining traditional riddles with context-reconstructed variants and evaluate five LLMs- Gemini 2.5 Pro, Gemini 2.5 Flash, Mistral-Saba, LLaMA-4-Scout, and LLaMA-4-Maverick under seven prompting strategies. In the first stage, we assess riddle-solving performance and find that while Gemini 2.5 Pro performs best overall, few-shot methods yield only marginal gains, and accuracy varies notably across languages. In the second stage, we conduct a self-evaluation experiment to measure reasoning consistency. The results reveal a key finding: a model’s initial accuracy is inversely correlated with its ability to identify its own mistakes. Top-performing models such as Gemini 2.5 Pro are overconfident (4.34% True Negative Rate), whereas lower-performing models like LLaMA-4-Scout are substantially more self-aware (42.09% True Negative Rate). These results point to clear gaps in multilingual reasoning and highlight the need for models that not only reason effectively but also recognize their own limitations.

2025

pdf bib abs

Family helps one another: Dravidian NLP suite for Natural Language Understanding
Abhinav P M | Priyanka Dasari | Nagaraju Vuppala | Parameswari Krishnamurthy
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Developing robust Natural Language Understanding (NLU) for morphologically rich Dravidian languages like Kannada, Malayalam, Tamil, and Telugu presents significant challenges due to their agglutinative nature and syntactic complexity. In this work, we present the Dravidian NLP Suite tackling five core tasks: Morphological Analysis (MA), POS Tagging (POS), Named Entity Recognition (NER), Dependency Parsing (DEP), and Coreference Resolution (CR), trained for monolingual models and multilingual models. To facilitate this, we present the Dravida dataset, meticulously annotated multilingual corpus for these tasks across all four languages. Our experiments demonstrate that a multilingual model, which utilizes shared linguistic features and cross-lingual patterns inherent to the Dravidian family, consistently outperforms its monolingual counterparts across all tasks. These findings suggest that multilingual learning is an effective approach for enhancing Natural Language Understanding (NLU) capabilities, particularly for languages belonging to the same family. To the best of our knowledge, this is the first work to jointly address all these core tasks on the Dravidian languages.

pdf bib

VIDAI: VIDukathAI Interpretation Through Analysis of In-context Reasoning in Tamil using LLMs
R S Mughil Srinivasan | Kesavan T | Abhijith Balan | Abhinav P M | Parameswari Krishnamurthy | Oswald C
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

2024

pdf bib abs

MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages
Abhinav P M | Ketaki Shetye | Parameswari Krishnamurthy
Proceedings of the Ninth Conference on Machine Translation

Machine Translation for low-resource languages presents significant challenges, primarily due to limited data availability. We have a baseline model and a primary model. For the baseline model, we first fine-tune the mBART model (mbart-large-50-many-to-many-mmt) for the language pairs English-Khasi, Khasi-English, English-Manipuri, and Manipuri-English. We then augment the dataset by back-translating from Indic languages to English. To enhance data quality, we fine-tune the LaBSE model specifically for Khasi and Manipuri, generating sentence embeddings and applying a cosine similarity threshold of 0.84 to filter out low-quality back-translations. The filtered data is combined with the original training data and used to further fine-tune the mBART model, creating our primary model. The results show that the primary model slightly outperforms the baseline model, with the best performance achieved by the English-to-Khasi (en-kh) primary model, which recorded a BLEU score of 0.0492, a chrF score of 0.3316, and a METEOR score of 0.2589 (on a scale of 0 to 1), with similar results for other language pairs.

Co-authors

Ketaki Shetye 1

R S Mughil Srinivasan 1

Kesavan T 1

Nagaraju Vuppala 1

Venues

Fix author