Trung Kiet Huynh
2026
HCMUSDroneBoys at SemEval-2026 Task 11: Asymmetric Counterfactual Debiasing and Rank-Sensitive Logical Invariance Adaptation for Syllogistic Reasoning
Nguyen Tran | Duy Minh Dao Sy | Trung Kiet Huynh | Phu Hoa Pham | Phu Quy Nguyen Lam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Nguyen Tran | Duy Minh Dao Sy | Trung Kiet Huynh | Phu Hoa Pham | Phu Quy Nguyen Lam
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our system for SemEval-2026 Task 11, Subtask 1: binary classification of syllogistic validity in English. The main challenge is the content effect, where language models confuse formal logical validity with how plausible the argument sounds. We propose three techniques that work together to separate logical form from semantic content: (1) Structure-Disentangled Prompting (SDP), which breaks syllogisms into premise-conclusion triples and uses a logic-first instruction template; (2) Asymmetric Counterfactual Debiasing (ACD), a data augmentation method that only generates valid-to-invalid counterfactual pairs, taking advantage of an asymmetry in validity composition to avoid label noise; and (3) Rank-Sensitive Logical Invariance Adaptation (RLIA), where we find that low-rank QLoRA adapters cannot simultaneously learn classification and suppress content-correlated shortcuts, and solve this by increasing adapter rank. Built on Qwen2.5-14B-Instruct, our system achieved a perfect Combined Score of 100.0 on the SemEval-2026 Task 11 Subtask 1 benchmark.
HCMUS RepeatedGames at SemEval-2026 Task 12: CausalRAG: Synergizing Causal Graph Retrieval and Extended LoRA for Abductive Reasoning
Duy Minh Dao Sy | Nguyen Tran | Trung Kiet Huynh | Phu Quy Nguyen Lam | Phu Hoa Pham
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Duy Minh Dao Sy | Nguyen Tran | Trung Kiet Huynh | Phu Quy Nguyen Lam | Phu Hoa Pham
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents our system developed for SemEval-2026 Task 12: Abductive Event Reasoning (AER). The shared task aims at identifying the most plausible cause of a real-world event from multiple-choice options, given retrieved documents as evidence. In this work, we propose using hybrid retrieval that combines BM25 keyword matching with dense semantic search to capture explicit causal keywords. Moreover, we apply extended LoRA fine-tuning that trains both attention and MLP layers of a 32-billion parameter language model with only 0.81% trainable parameters. For final refinement, we perform development set fine-tuning to leverage validation data before inference. We achieve a tie for fifth place in the shared task: our system achieves a score of 0.90 on the official test set evaluation, ranking tied for fifth among participating teams and representing a +0.27 improvement over our baseline.
HCMUS_PrompterXPrompter at AbjadMed: When Classification Meets Retrieval: Taming the Long Tail in Arabic Medical Text Classification
Duy Minh Dao Sy | Trung Kiet Huynh | Nguyen Dinh Ha Duong | Nguyen Chi Tran | Phu Quy Nguyen Lam | Hoa Pham Phu
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Duy Minh Dao Sy | Trung Kiet Huynh | Nguyen Dinh Ha Duong | Nguyen Chi Tran | Phu Quy Nguyen Lam | Hoa Pham Phu
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Medical text classification is high-stakes work, yet models often falter precisely where they are needed most: on rare, critical conditions buried in the long tail of the data distribution. In the EACL 2026 ABJAD-NLP Shared Task, we confronted this challenge with a dataset of Arabic medical questions heavily skewed towards a few common topics, leaving dozens of categories with fewer than ten examples. We present HybridMed, a system that effectively tames this long tail by marrying the semantic generalization of a fine-tuned Arabic BERT model with the precise, instance-based memory of k-nearest neighbor retrieval. This complementary union allowed our system to achieve a macro-F1 score of 0.4902, demonstrating that for diverse and imbalanced medical data, the whole is indeed greater than the sum of its parts.
HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models
Trung Kiet Huynh | Duy Minh Dao Sy | Nguyen Chi Tran | Pham Phu Hoa | Nguyen Lam Phu Quy | Truong Bao Tran
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Trung Kiet Huynh | Duy Minh Dao Sy | Nguyen Chi Tran | Pham Phu Hoa | Nguyen Lam Phu Quy | Truong Bao Tran
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
We present our approach to the AbjadNLP 2026 Arabic Authorship Identification shared task, achieving 4th place. Our key finding is that AraBERT-base (110M) outperforms AraBERT-large (340M) on the test set with macro F1 of 0.8449 versus 0.8096, despite lower validation scores. We handle long passages via sliding window chunking with mean pooling, and use a two-stage classification head with dual dropout for regularization. Per-class analysis reveals that translated works achieve perfect F1 while classical poets remain challenging due to shared formal structures. Our results challenge the "scale is all you need" assumption for stylometric tasks.
HCMUS_The Fangs at AbjadStyleTransfer Shared Task: Learning to Query Style, Contrastive Representations for Zero-Shot Arabic Authorship Style Transfer
Duy Minh Dao Sy | Trung Kiet Huynh | Nguyen Chi Tran | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Duy Minh Dao Sy | Trung Kiet Huynh | Nguyen Chi Tran | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
This paper describes the system developed by team HCMUS_The Fangs for the AbjadStyleTransfer shared task (ArabicNLP 2026), where we achieved 1st place. We present a contrastive style learning approach for zero-shot Arabic authorship style transfer. Our key discovery is that the 21 test authors-including Nobel laureate Naguib Mahfouz and literary pioneer Taha Hussein-have zero overlap with the 32,784 training authors, transforming this into a pure zero-shot challenge. This insight led us to develop a dual-encoder architecture that learns transferable style representations through contrastive objectives, rather than memorizing author-specific patterns. Our system achieves 19.77 BLEU and 55.74 chrF, outperforming retrieval-augmented generation (+18%) and multi-task learning (+31%). Counter-intuitively, we find that sophisticated architectural modifications like style injection consistently degrade performance, while simpler approaches that preserve pre-trained knowledge excel. Our analysis reveals that for famous authors, pre-trained Arabic language models already encode substantial stylistic knowledge-the key is surfacing it, not learning from scratch.
HCMUS_TheFangs at AbjadGenEval Shared Task: Weighted Layer Pooling with Attention Fusion for Arabic AI-Generated Text Detection
Duy Minh Dao Sy | Nguyen Chi Tran | Trung Kiet Huynh | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Duy Minh Dao Sy | Nguyen Chi Tran | Trung Kiet Huynh | Nguyen Lam Phu Quy | Pham Phu Hoa | Nguyen Dinh Ha Duong
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
The rapid advancement of large language mod-els poses significant challenges for content au-thenticity, particularly in under-resourced lan-guages where detection tools remain scarce.We present our winning system for the Abjad-GenEval shared task on Arabic AI-generatedtext detection. Our key insight is that AI-generated text exhibits distinctive patternsacross multiple linguistic levels-from local syn-tax to global semantics-that can be captured bylearning to fuse representations from differenttransformer layers. We introduce aWeightedLayer Poolingmechanism that learns optimallayer combinations, combined withAttentionPoolingfor sequence-level context aggregation.Through systematic experimentation with 15+ approaches, we make a surprising discovery:model architecture selection dominates over so-phisticated training techniques, with DeBERTa-v3 providing +27% relative improvement overAraBERT regardless of training strategy. Oursystem achieves 0.93 F1-score, securing 1st placeamong all participants and outperform-ing the runner-up by 3 absolute points
2025
Challenge Track: JHARNA-MT: A Copy-Augmented Hybrid of LoRA-Tuned NLLB and Lexical SMT with Minimum Bayes Risk Decoding for Low-Resource Indic Languages
Dao Sy Duy Minh | Trung Kiet Huynh | Tran Chi Nguyen | Phu Quy Nguyen Lam | Phu-Hoa Pham | Nguyễn Đình Hà Dương | Dien Dinh | Long HB Nguyen
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Dao Sy Duy Minh | Trung Kiet Huynh | Tran Chi Nguyen | Phu Quy Nguyen Lam | Phu-Hoa Pham | Nguyễn Đình Hà Dương | Dien Dinh | Long HB Nguyen
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
This paper describes JHARNA-MT, our system for the MMLoSo 2025 Shared Task on translation between high-resource languages (Hindi, English) and four low-resource Indic tribal languages: Bhili, Gondi, Mundari, and Santali. The task poses significant challenges, including data sparsity, morphological richness, and structural divergence across language pairs. To address these, we propose a hybrid translation pipeline that integrates non-parametric retrieval, lexical statistical machine translation (SMT), and LoRA-tuned NLLB-200 neural machine translation under a unified Minimum Bayes Risk (MBR) decoding framework. Exact and fuzzy retrieval exploit redundancy in government and administrative texts, SMT with diagonal alignment priors and back-translation provides lexically faithful hypotheses, and the NLLB-LoRA component contributes fluent neural candidates. MBR decoding selects consensus translations using a metric-matched utility based on a weighted combination of BLEU and chrF, mitigating the complementary error modes of SMT and NMT. Our final system, further enhanced with script-aware digit normalization and entity-preserving post-processing, achieves a private leaderboard score of 186.37 and ranks 2nd overall in the shared task, with ablation studies confirming the contribution of each component.