Yuqian Dai


2026

Pre-trained speech models like Whisper demonstrate impressive performance under ideal conditions but still face robustness challenges in low-resource language scenarios. We introduce Meta Curriculum Optimization for Robust ASR (MetaCORA), a novel meta-curriculum adaptive framework that improves speech recognition for low-resource Hong Kong Cantonese by integrating adversarial training with feature contrastive learning. Our approach dynamically adjusts three critical hyperparameters: adversarial perturbation magnitude, optimization step size, and contrastive learning temperature, allowing the model to adapt to varying training difficulties throughout the learning process. Unlike traditional meta-learning approaches, our framework does not rely on end-to-end differentiability but instead utilizes validation performance as a signal to guide hyperparameter adjustments. Experimental results demonstrate that our approach achieves lower WER than standard Whisper fine-tuning, commercial speech recognition systems, and LLM-based methods. Ablation studies confirm the necessity of each component, as removing any single element leads to a measurable drop in performance. The model also exhibits robustness under noisy conditions, achieving consistently lower WER than baseline systems. Further analysis shows that MetaCORA effectively compresses the distance between adversarial feature representations while maintaining well-separated class boundaries in the embedding space, providing a mechanistic explanation for its improvement.

2025

Large Language Models (LLMs) have improved performance across various natural language processing tasks. Despite these improvements, LLMs continue to face significant challenges, such as grammatical issues and code-switching to English, when applied to low-resource languages like Cantonese in Machine Translation (MT) scenarios. By addressing the unique linguistic and contextual challenges of Cantonese, we present a novel strategy to improve the understanding and translation capabilities of LLMs for Cantonese-to-Mandarin MT. Our strategy comprises three key components: (1) Syntax and Part-of-Speech (POS) fine-tuning, where we use the Universal Dependencies (UD) corpus to fine-tune LLM, focusing on the linguistic structures of Cantonese; (2) Specialized Cantonese to Mandarin sentence pairs, collected from diverse sources such as Cantonese grammar textbooks and manually translated sentences across various domains, to expose the model to a wide range of linguistic contexts; (3) Post-processing with additional LLMs, where we introduce additional LLMs to improve the initial translations, correcting Mandarin grammar and punctuation. Empirical evaluations on human-created test sets show that our proposed strategy improves translation performance and outperforms existing commercial translation models with at least 3 BLEU scores. Additionally, our strategy also benefits other LLMs and a reversed translation direction, demonstrating its generalization and effectiveness.

2022

Pre-trained transformer-based models, such as BERT, have shown excellent performance in most natural language processing benchmark tests, but we still lack a good understanding of the linguistic knowledge of BERT in Neural Machine Translation (NMT). Our work uses syntactic probes and Quality Estimation (QE) models to analyze the performance of BERT’s syntactic dependencies and their impact on machine translation quality, exploring what kind of syntactic dependencies are difficult for NMT engines based on BERT. While our probing experiments confirm that pre-trained BERT “knows” about syntactic dependencies, its ability to recognize them often decreases after fine-tuning for NMT tasks. We also detect a relationship between syntactic dependencies in three languages and the quality of their translations, which shows which specific syntactic dependencies are likely to be a significant cause of low-quality translations.