Phan Phat


2026

Address the extreme label imbalance in the Hausa dataset where only 11% of instances are polarized—through the Augment-Judge-Train (AJT) pipeline. By leveraging Gemini 2.0 for taxonomy-driven data generation and an LLM-as-a-Judge layer for quality control, we expanded the minority class sixfold. Our ensemble architecture, combining specialized Encoders with LLM-LORA, achieved 1st Place in Hausa (0.8336 Macro-F1) and ranked in the Top 10 for English. These results demonstrate the efficacy of culture-aware synthetic data in enhancing social NLP for low-resource languages.
Search
Co-authors
    Venues
    Fix author