Minh-Hoang Le

2026

AlphaLyrae at SemEval-2026 Task 9: Metric Learning and Asymmetric Loss for Chinese Polarization Analysis
Minh-Hoang Le | Khoan Phung
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

For the Chinese track of SemEval-2026 Task 9 (Detecting Online Polarization), we address two key challenges: polarized content frequently uses implicit language (e.g., homophones and coded terms) to evade moderation, and class distributions exhibit severe long-tail imbalance. We propose a metric learning approach that frames polarization detection as semantic similarity matching, which captures implicit language patterns better than linear decision boundaries. We fine-tune an ERNIE-3.0 encoder with SoftTriple loss and apply ik/iNN retrieval for binary detection (Subtask 1). For multi-label categorization (Subtasks 2 and 3), we transfer learned representations from the detection model and fine-tune with Asymmetric Loss. A priority-based stratified cross-validation strategy ensures minority classes appear across all training folds despite extreme label skew. Evaluated on the official 1,927-sample test set using an end-to-end pipeline, our system achieved Macro-F1 scores of 0.9190 (Rank 6) on Polarization Detection, 0.8244 (Rank 5) on Type Classification, and 0.6670 (Rank 4) on Manifestation Identification.

pdf bib abs

KvochurHegel at AbjadMed: Combining LDAM Loss and Adversarial Training for Arabic Medical Question-Answer Classification
Minh-Hoang Le
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

This paper describes our team’s submission to AbjadMed at AbjadNLP 2026. The task involves classifying Arabic medical question-answer pairs into 82 categories, characterized by a long-tail distribution and significant semantic overlap. While domain-specific Arabic models exist, they are primarily optimized for Named Entity Recognition or span-extraction tasks rather than high-cardinality sequence classification. Consequently, our system adopts a robust optimization approach using a general-purpose encoder. We utilize ARBERTv2 as the backbone, employing Label-Distribution-Aware Margin (LDAM) loss to mitigate class imbalance and Fast Gradient Method (FGM) adversarial training to enhance generalization boundaries. Our approach achieves a Macro-F1 score of 0.4028 on the private test set, demonstrating that advanced optimization techniques can yield competitive performance on specialized taxonomies without requiring domain-specific pre-training.

Co-authors

Khoan Phung 1

Venues

Fix author