Yi-Chin Huang

2023

pdf
Whisper Model Adaptation for FSR-2023 Hakka Speech Recognition Challenge
Yi-Chin Huang | Ji-Qian Tsai
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

pdf bib
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Yung-Chun Chang | Yi-Chin Huang
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

pdf abs
A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data
Yi-Hsiang Hung | Yi-Chin Huang
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this study, we implemented a machine translation system using the Convolutional Neural Network with Attention mechanism for translating Mandarin to Sixan-accent Hakka. Specifically, to cope with the different idioms or terms used between Northern and Southern Sixan-accent, we analyzed the corpus differences and lexicon definition, and then separated the various word usages for training exclusive models for each accent. Besides, since the collected Hakka corpora are relatively limited, the unseen words frequently occurred during real-world translation. In our system, we selected suitable thresholds for each model based on the model verification to reject non-suitable translated words. Then, by applying the proposed algorithm, which adopted the forced Hakka idioms/terms segmentation and the common Mandarin word substitution, the resultant translation sentences become more intelligible. Therefore, the proposed system achieved promising results using small-sized data. This system could be used for Hakka language teaching and also the front-end of Mandarin and Hakka code-switching speech synthesis systems.

2021

pdf
整合語者嵌入向量與後置濾波器於提升個人化合成語音之語者相似度 (Incorporating Speaker Embedding and Post-Filter Network for Improving Speaker Similarity of Personalized Speech Synthesis System)
Sheng-Yao Wang | Yi-Chin Huang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

pdf abs
Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system
Sheng-Yao Wang | Yi-Chin Huang
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In recent years, speech synthesis system can generate speech with high speech quality. However, multi-speaker text-to-speech (TTS) system still require large amount of speech data for each target speaker. In this study, we would like to construct a multi-speaker TTS system by incorporating two sub modules into artificial neural network-based speech synthesis system to alleviate this problem. First module is to add speaker embedding into encoding module for generating speech while a large amount of the speech data from target speaker is not necessary. For speaker embedding method, in our study, two main speaker embedding methods, namely speaker verification embedding and voice conversion embedding, are compared to deciding which one is suitable for our personalized TTS system. Second, we substituted the conventional post-net module, which is adopted to enhance the output spectrum sequence, to further improving the speech quality of the generated speech utterance. Here, a post-filter network is used. Finally, experiment results showed that the speaker embedding is useful by adding it into encoding module and the resultant speech utterance indeed perceived as the target speaker. Also, the post-filter network not only improving the speech quality and also enhancing the speaker similarity of the generated speech utterances. The constructed TTS system can generate a speech utterance of the target speaker in fewer than 2 seconds. In the future, we would like to further investigate the controllability of the speaking rate or perceived emotion state of the generated speech.

Yi-Chin Huang

2023

2022

2021

2019

2013

Co-authors

Venues