Nguyen Tran
2026
HCMUSDroneBoys at SemEval-2026 Task 11: Asymmetric Counterfactual Debiasing and Rank-Sensitive Logical Invariance Adaptation for Syllogistic Reasoning
Nguyen Tran | Duy Minh Dao Sy | Trung Kiet Huynh | Phu Hoa Pham | Phu Quy Nguyen Lam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Nguyen Tran | Duy Minh Dao Sy | Trung Kiet Huynh | Phu Hoa Pham | Phu Quy Nguyen Lam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our system for SemEval-2026 Task 11, Subtask 1: binary classification of syllogistic validity in English. The main challenge is the content effect, where language models confuse formal logical validity with how plausible the argument sounds. We propose three techniques that work together to separate logical form from semantic content: (1) Structure-Disentangled Prompting (SDP), which breaks syllogisms into premise-conclusion triples and uses a logic-first instruction template; (2) Asymmetric Counterfactual Debiasing (ACD), a data augmentation method that only generates valid-to-invalid counterfactual pairs, taking advantage of an asymmetry in validity composition to avoid label noise; and (3) Rank-Sensitive Logical Invariance Adaptation (RLIA), where we find that low-rank QLoRA adapters cannot simultaneously learn classification and suppress content-correlated shortcuts, and solve this by increasing adapter rank. Built on Qwen2.5-14B-Instruct, our system achieved a perfect Combined Score of 100.0 on the SemEval-2026 Task 11 Subtask 1 benchmark.
HCMUS RepeatedGames at SemEval-2026 Task 12: CausalRAG: Synergizing Causal Graph Retrieval and Extended LoRA for Abductive Reasoning
Duy Minh Dao Sy | Nguyen Tran | Trung Kiet Huynh | Phu Quy Nguyen Lam | Phu Hoa Pham
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Duy Minh Dao Sy | Nguyen Tran | Trung Kiet Huynh | Phu Quy Nguyen Lam | Phu Hoa Pham
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for SemEval-2026 Task 12: Abductive Event Reasoning (AER). The shared task aims at identifying the most plausible cause of a real-world event from multiple-choice options, given retrieved documents as evidence. In this work, we propose using hybrid retrieval that combines BM25 keyword matching with dense semantic search to capture explicit causal keywords. Moreover, we apply extended LoRA fine-tuning that trains both attention and MLP layers of a 32-billion parameter language model with only 0.81% trainable parameters. For final refinement, we perform development set fine-tuning to leverage validation data before inference. We achieve a tie for fifth place in the shared task: our system achieves a score of 0.90 on the official test set evaluation, ranking tied for fifth among participating teams and representing a +0.27 improvement over our baseline.
5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control
Thien-Qua T-Nguyen | Chi Hoang | Nguyen Tran | Tri Le | Khanh Truong | Chinh Nguyen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Thien-Qua T-Nguyen | Chi Hoang | Nguyen Tran | Tri Le | Khanh Truong | Chinh Nguyen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents a modular multi-turn Retrieval-Augmented Generation (RAG) system designed to mitigate hallucination, context drift, and underspecification. The pipeline combines dual-query merged retrieval and LLM-based reranking to deliver high-precision evidence, improving nDCG@5 by 17.7%. To strictly control hallucination during generation, we introduce a role-separated prompting strategy. - This approach explicitly isolates the conversation history (used solely for intent and coreference resolution) from the retrieved passages (enforced as the exclusive source of factual grounding). - By preventing the language model from misinterpreting prior dialogue turns as factual evidence, the system ranked 3/29 in the SemEval-2026 Task 8 end-to-end evaluation. - Notably, our faithfulness-oriented design achieved a high ROUGE-L F1 score of 0.7692, outperforming larger baselines and demonstrating that explicit grounding constraints are highly effective at ensuring lexical faithfulness and reducing hallucinations.