The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring
Yi-Chin Huang, Yu-Heng Chen, Jian-Hua Wang, Hsiu-Chi Wu, Chih-Chung Kuo, Chao-Shih Huang, Yuan-Fu Liao
Abstract
This paper presents our system for the FSR-2025 Hakka Automatic Speech Recognition (ASR) Challenge, which consists of two sub-tasks: (i) Hakka Characters and (ii) Hakka Pinyin. We propose a unified architecture built upon Whisper [1], a large weakly supervised ASR model, as the acoustic backbone, with optional LoRA (Low-Rank Adaptation [2]) for parameter-efficient fine-tuning. Data augmentation techniques include the MUSAN [3] corpus (music/speech/noise) and tempo/speed perturbation [4]. For the character task, mBART-50 [5,6], a multilingual sequence-to-sequence model, is applied for text correction, while both tasks employ an RNNLM [7] for N-best rescoring. Under the final evaluation setting of the character task, mBART-driven 10-best text correction combined with RNNLM rescoring achieved a CER (Character Error Rate) of 6.26%, whereas the official leaderboard reported 22.5%. For the Pinyin task, the Medium model proved more suitable than the Large model given the dataset size and accent distribution. With 10-best RNNLM rescoring, it achieved a SER (Syllable Error Rate) of 4.65% on our internal warm-up test set, and the official final score (with tone information) was 14.81%. Additionally, we analyze the contribution of LID (Language Identification) for accent recognition across different recording and media sources.- Anthology ID:
- 2025.rocling-main.63
- Volume:
- Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
- Month:
- November
- Year:
- 2025
- Address:
- National Taiwan University, Taipei City, Taiwan
- Editors:
- Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
- Venue:
- ROCLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 518–522
- Language:
- URL:
- https://preview.aclanthology.org/dashboard/2025.rocling-main.63/
- DOI:
- Cite (ACL):
- Yi-Chin Huang, Yu-Heng Chen, Jian-Hua Wang, Hsiu-Chi Wu, Chih-Chung Kuo, Chao-Shih Huang, and Yuan-Fu Liao. 2025. The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 518–522, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
- Cite (Informal):
- The NPTU ASR System for FSR2025 Hakka Character/Pinyin Recognition: Whisper with mBART Post-Editing and RNNLM Rescoring (Huang et al., ROCLING 2025)
- PDF:
- https://preview.aclanthology.org/dashboard/2025.rocling-main.63.pdf