Rakshith R

2026

AICOE-Tredence at SemEval-2026 Task 11: Mitigating Content Bias in Syllogisms via Symbolic Logic-Language Decoupling
Rakshith R | Ankush Chopra
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Content bias remains a key limitation of large language models (LLMs), which often conflate formal logical validity with real-world plausibility. SemEval-2026 Task 11 examines this challenge through multilingual syllogistic reasoning, requiring models to judge validity independently of content. We propose a structure-first reasoning paradigm that abstracts natural language syllogisms into Aristotelian logical forms. By mapping arguments to mood–figure representations and classifying validity in this symbolic space, our approach removes semantic content from the reasoning process. On the private test sets of Subtasks 1 and 3, our method achieves a perfect combined score, with 100% validity accuracy and zero content bias in both English and multilingual settings using Gemini-3 Pro Preview. We also explore transferring this paradigm to smaller models via structural supervision, finding that distilled systems retain high accuracy with minimal bias. These results suggest that explicitly separating logical form from linguistic content is a promising direction for bias-resilient and cross-lingually robust reasoning in LLMs.

2025

pdf bib abs

AICOE at PerAnsSumm 2025: An Ensemble of Large Language Models for Perspective-Aware Healthcare Answer Summarization
Rakshith R | Mohammed Sameer Khan | Ankush Chopra
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

The PerAnsSumm 2024 shared task at the CL4Health workshop focuses on generating structured, perspective-specific summaries to enhance the accessibility of health-related information. Given a Healthcare community QA dataset containing a question, context, and multiple user-answers, the task involves identifying relevant perspective categories, extracting spans from these perspectives, and generating concise summaries for the extracted spans. We fine-tuned open-source models such as Llama-3.2 3B, Llama-3.1 8B, and Gemma-2 9B, while also experimenting with proprietary models including GPT-4o, o1, Gemini-1.5 Pro, and Gemini-2 Flash Experimental using few-shot prompting. Our best-performing approach leveraged an ensemble strategy, combining span outputs from o1 (CoT) and Gemini-2 Flash Experimental. For overlapping perspectives, we prioritized Gemini. The final spans were summarized using Gemini, preserving the higher classification accuracy of o1 while leveraging Gemini’s superior span extraction and summarization capabilities. This hybrid method secured fourth place on the final leaderboard among 100 participants and 206 submissions.

Co-authors

Venues

Fix author