Laksh Advani
2026
lakshadvani at SemEval-2026 Task 11: A Neuro-Symbolic Approach to Content-Independent Syllogistic Reasoning
Laksh Advani
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Laksh Advani
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We describe our system for SemEval-2026 Task 11 on disentangling content from formal reasoning. The content effect in syllogistic reasoning, where models judge validity based on conclusion plausibility rather than logical structure, persists even with explicit instructions to ignore real-world knowledge. We find that this bias is better addressed structurally than through prompting: by restricting the LLM to a translation role (mapping natural language to abstract variables) and delegating all deductive reasoning to a deterministic checker over the 24 valid Aristotelian forms, we eliminate content bias entirely on Subtask 1 (100.0 combined, TCE=0.0, 4th place).Our Subtask 2 system, which lacks this separation, scores 41.08 (7th place) despite 95.26% accuracy and 99.47% premise retrieval F1, because a TCE of 2.94 incurs a 58% penalty. A three-way ablation on training data using GPT-5 confirms the pattern:Vanilla LLM: 78% accuracy / TCE=19LLM + Aristotelian Rules in Prompt: 90% accuracy / TCE=5LLM + Symbolic Checker: 97% accuracy / TCE=3
2023
Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP
Wei Du | Laksh Advani | Yashmeet Gambhir | Daniel Perry | Prashant Shiralkar | Zhengzheng Xing | Aaron Colak
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Wei Du | Laksh Advani | Yashmeet Gambhir | Daniel Perry | Prashant Shiralkar | Zhengzheng Xing | Aaron Colak
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemble disagreement scores work well as a proxy for human labeling for language models in zero-shot, few-shot, and fine-tuned settings, per our evaluation on keyphrase extraction (KPE) task. We measure fidelity of the results by comparing to true error measured from human labeled ground truth. We contrast with the alternative of using another LLM as a source of machine labels, or ‘silver labels’. Results across various languages and domains show disagreement scores provide a better estimation of model performance with mean average error (MAE) as low as 0.4% and on average 13.8% better than using silver labels.
2020
C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text Using Feature Engineering
Laksh Advani | Clement Lu | Suraj Maharjan
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Laksh Advani | Clement Lu | Suraj Maharjan
Proceedings of the Fourteenth Workshop on Semantic Evaluation
In today’s interconnected and multilingual world, code-mixing of languages on social media is a common occurrence. While many Natural Language Processing (NLP) tasks like sentiment analysis are mature and well designed for monolingual text, techniques to apply these tasks to code-mixed text still warrant exploration. This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We tackle this problem by leveraging a set of hand-engineered lexical, sentiment, and metadata fea- tures to design a classifier that can disambiguate between “positive”, “negative” and “neutral” sentiment. With this model we are able to obtain a weighted F1 score of 0.65 for the “Hinglish” task and 0.63 for the “Spanglish” tasks.