Kuan-Tang Huang
2026
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
Peng An-Ci | Kuan-Tang Huang | Tien-Hong Lo | Hung-Shin Lee | Hsin-Min Wang | Berlin Chen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Peng An-Ci | Kuan-Tang Huang | Tien-Hong Lo | Hung-Shin Lee | Hsin-Min Wang | Berlin Chen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often encounter difficulties in this context, as they tend to conflate essential linguistic content with dialect-specific variations across both phonological and lexical dimensions. To address these challenges, we propose a unified framework grounded in the Recurrent Neural Network Transducers (RNN-T). Central to our approach is the introduction of dialect-aware modeling strategies designed to disentangle dialectal ”style” from linguistic ”content”, which enhances the model’s capacity to learn robust and generalized representations. Additionally, the framework employs parameter-efficient prediction networks to concurrently model ASR (Hanzi and Pinyin). We demonstrate that these tasks create a powerful synergy, wherein the cross-script objective serves as a mutual regularizer to improve the primary ASR tasks. Experiments conducted on the HAT corpus reveal that our model achieves 57.00% and 40.41% relative error rate reduction on Hanzi and Pinyin ASR, respectively. To our knowledge, this is the first systematic investigation into the impact of Hakka dialectal variations on ASR and the first single model capable of jointly addressing these tasks.
2025
CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
Hung-Yang Sung | Chien-Chun Wang | Kuan-Tang Huang | Tien-Hong Lo | Yu-Sheng Tsao | Yung-Chang Hsu | Berlin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Hung-Yang Sung | Chien-Chun Wang | Kuan-Tang Huang | Tien-Hong Lo | Yu-Sheng Tsao | Yung-Chang Hsu | Berlin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Automatic speech recognition (ASR) for low-resource languages such as Taiwanese Hokkien is difficult due to the scarcity of annotated data. However, direct fine-tuning on Han-character transcriptions often fails to capture detailed phonetic and tonal cues, while training only on romanization lacks lexical and syntactic coverage. In addition, prior studies have rarely explored staged strategies that integrate both annotation types. To address this gap, we present CLiFT-ASR, a cross-lingual fine-tuning framework that builds on Mandarin HuBERT models and progressively adapts them to Taiwanese Hokkien. The framework employs a two-stage process in which it first learns acoustic and tonal representations from phonetic Tai-lo annotations and then captures vocabulary and syntax from Han-character transcriptions. This progressive adaptation enables effective alignment between speech sounds and orthographic structures. Experiments on the TAT-MOE corpus demonstrate that CLiFT-ASR achieves a 24.88% relative reduction in character error rate (CER) compared with strong baselines. The results indicate that CLiFT-ASR provides an effective and parameter-efficient solution for Taiwanese Hokkien ASR and that it has potential to benefit other low-resource language scenarios.
A Channel-Aware Anomaly-Guided Data Augmentation Framework for the FSR-2025 Hakka Speech Recognition Challenge
Siang-Ting Lin | Arthur Hao | Chiun-Yu Hua | Kuan-Tang Huang | Berlin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Siang-Ting Lin | Arthur Hao | Chiun-Yu Hua | Kuan-Tang Huang | Berlin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
The Formosa Speech Recognition Challenge 2025 (FSR-2025) focuses on Taiwanese Hakka, a low-resource language with limited data diversity and channel coverage. To address this challenge, we propose a channel-aware, data-centric framework that leverages multilingual foundation models to mitigate mismatches between field recordings and training data. Our method integrates unsupervised anomaly detection and channel-conditioned augmentation to enhance data representativeness before ASR fine-tuning, aiming to explore the potential for improving robustness in low-resource Hakka speech recognition.