Manav Malhotra


2026

Large Language Models (LLMs) often struggle to disentangle formal logical validity from real-world plausibility, a phenomenon known as the "belief bias". This paper describes our submission to SemEval-2026 Task 11. We frame the task as a calibration problem between "System 1" (heuristic) and "System 2" (logical) thinking. Our experiments reveal that standard neuro-symbolic interventions, such as Structural Chain-of-Thought (CoT) and Nonsense Augmentation, degrade performance in low-resource regimes due to an "abstraction penalty". Instead, we propose a Conflict-Aware Logit Ensemble. We fine-tune two variations of Qwen-2.5-14B: a standard "Believer" model and a bias-hardened "Skeptic" model trained on oversampled conflict data. By ensembling their logits via soft-voting, we achieve a Pareto-optimal balance, reducing the Total Content Effect (TCE) to 3.21 while maintaining an overall accuracy of 94.27%, resulting in a Combined Score of 39.09.

2021

While there has been significant progress towards developing NLU resources for Indic languages, syntactic evaluation has been relatively less explored. Unlike English, Indic languages have rich morphosyntax, grammatical genders, free linear word-order, and highly inflectional morphology. In this paper, we introduce Vyākarana: a benchmark of Colorless Green sentences in Indic languages for syntactic evaluation of multilingual language models. The benchmark comprises four syntax-related tasks: PoS Tagging, Syntax Tree-depth Prediction, Grammatical Case Marking, and Subject-Verb Agreement. We use the datasets from the evaluation tasks to probe five multilingual language models of varying architectures for syntax in Indic languages. Due to its prevalence, we also include a code-switching setting in our experiments. Our results show that the token-level and sentence-level representations from the Indic language models (IndicBERT and MuRIL) do not capture the syntax in Indic languages as efficiently as the other highly multilingual language models. Further, our layer-wise probing experiments reveal that while mBERT, DistilmBERT, and XLM-R localize the syntax in middle layers, the Indic language models do not show such syntactic localization.