Huy Le
2026
VAP-GameController at SemEval-2026 Task 2: Lexical-based and Emotion-Aware Approaches for Longtitudinal Emotion Prediction
Huy Le | Truong Phu | Trung Tran | Nga Nguyen | Monojit Choudhury
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Huy Le | Truong Phu | Trung Tran | Nga Nguyen | Monojit Choudhury
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
In this work, we participate in SemEval-2026 Task 2, which focuses on predicting continuous valence and arousal trajectories from longitudinal ecological essays. To model fine-grained emotional dynamics, we explore three approaches: (1) hierarchical encoder-based models to capture contextual emotional patterns, (2) a lexicon-based pipeline with linguistic rules and a dual-level calibration mechanismfor personalized estimation, and (3) a hybrid framework that integrates lexical emotional signals into neural encoders. Experiments on the official dataset, evaluated using Pearson correlation (r) and MAE, show consistent improvements over baseline methods, highlighting the complementary strengths of neural representations and calibrated lexical features.
Grammatical Error Correction for Low-Resource Languages: The Case of Zarma
Mamadou K. Keita | Adwoa Bremang | Huy Le | Dennis Owusu | Marcos Zampieri | Christopher Homan
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Mamadou K. Keita | Adwoa Bremang | Huy Le | Dennis Owusu | Marcos Zampieri | Christopher Homan
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Grammatical error correction (GEC) aims to improve text quality and readability. Previous work on the task focused primarily on high-resource languages, while low-resource languages lack robust tools. To address this shortcoming, we present a study on GEC for Zarma, a language spoken by over five million people in West Africa. We compare three approaches: rule-based methods, machine translation (MT) models, and large language models (LLMs). We evaluated GEC models using a dataset of more than 250,000 examples, including synthetic and human-annotated data. Our results showed that the MT-based approach using M2M100 outperforms others, with a detection rate of 95.82% and a suggestion accuracy of 78.90% in automatic evaluations (AE) and an average score of 3.0 out of 5.0 in manual evaluation (ME) from native speakers for grammar and logical corrections. The rule-based method was effective for spelling errors but failed on complex context-level errors. LLMs—Gemma 2b and MT5-small—showed moderate performance. Our work supports use of MT models to enhance GEC in low-resource settings, and we validated these results with Bambara, another West African language.
NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in African Low-Resource Languages
Mamadou K. Keita | Christopher M Homan | Huy Le
Findings of the Association for Computational Linguistics: ACL 2026
Mamadou K. Keita | Christopher M Homan | Huy Le
Findings of the Association for Computational Linguistics: ACL 2026
We introduce negative space learning machine translation (NSL-MT), a training method for underresourced languages, that augments limited parallel data with synthetically generated violations of the target language’s grammar and explicitly penalizes the model when it assigns high probability to these linguistically invalid outputs. NSL-MT delivers improvements across all baselines we tested, including 3-12% BLEU gains for well-performing models and 56-89% gains for models lacking decent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier: training with 1,000 examples matches or exceeds normal training with 5,000 examples. NSL-MT thus provides a data-efficient alternative training method for settings where parallel data is limited.