Chia-Ping Chen


2023

pdf
NSYSU-MITLab Speech Recognition System for Formosa Speech Recognition Challenge 2023
Hong-Jie Hu | Chia-Ping Chen
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

pdf
Investigation of feature processing modules and attention mechanisms in speaker verification system
Ting-Wei Chen | Wei-Ting Lin | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan | Yu-Han Cheng | Hsiang-Feng Chuang | Wei-Yu Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.

pdf
Development of Mandarin-English code-switching speech synthesis system
Hsin-Jou Lien | Li-Yu Huang | Chia-Ping Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this paper, the Mandarin-English code-switching speech synthesis system has been proposed. To focus on learning the content information between two languages, the training dataset is multilingual artificial dataset whose speaker style is unified. Adding language embedding into the system helps it be more adaptive to multilingual dataset. Besides, text preprocessing is applied and be used in different way which depends on the languages. Word segmentation and text-to-pinyin are the text preprocessing for Mandarin, which not only improves the fluency but also reduces the learning complexity. Number normalization decides whether the arabic numerals in sentence needs to add the digits. The preprocessing for English is acronym conversion which decides the pronunciation of acronym.

pdf
Lightweight Sound Event Detection Model with RepVGG Architecture
Chia-Chuan Liu | Sung-Jen Huang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan | Yu-Han Cheng | Hsiang-Feng Chuang | Wei-Yu Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this paper, we proposed RepVGGRNN, which is a light weight sound event detection model. We use RepVGG convolution blocks in the convolution part to improve performance, and re-parameterize the RepVGG blocks after the model is trained to reduce the parameters of the convolution layers. To further improve the accuracy of the model, we incorporated both the mean teacher method and knowledge distillation to train the lightweight model. The proposed system achieves PSDS (Polyphonic sound event detection score)-scenario 1, 2 of 40.8% and 67.7% outperforms the baseline system of 34.4% and 57.2% on the DCASE 2022 Task4 validation dataset. The quantity of the parameters in the proposed system is about 49.6K, only 44.6% of the baseline system.

pdf
Mandarin-English Code-Switching Speech Recognition System for Specific Domain
Chung-Pu Chiou | Hou-An Lin | Chia-Ping Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

This paper will introduce the use of Automatic Speech Recognition (ASR) technology to process speech content with specific domain. We will use the Conformer end-to-end model as the system architecture, and use pure Chinese data for initial training. Next, use the transfer learning technology to fine-tune the system with Mandarin-English code-switching data. Finally, use the Mandarin-English code-switching data with a specific domain makes the final fine-tuning of the model so that it can achieve a certain effect on speech recognition in a specific domain. Experiments with different fine-tuning methods reduce the final error rate from 82.0% to 34.8%.

2021

pdf bib
NSYSU-MITLab團隊於福爾摩沙語音辨識競賽2020之語音辨識系統 (NSYSU-MITLab Speech Recognition System for Formosa Speech Recognition Challenge 2020)
Hung-Pang Lin | Chia-Ping Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 1, June 2021

pdf
Exploiting Low-Resource Code-Switching Data to Mandarin-English Speech Recognition Systems
Hou-An Lin | Chia-Ping Chen
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In this paper, we investigate how to use limited code-switching data to implement a code-switching speech recognition system. We utilize the Transformer end-to-end model to develop our code switching speech recognition system, which is trained with the Mandarin dataset and a small amount of Mandarin-English code switching dataset, as the baseline of this paper. Next, we compare the performance of systems after adding multi-task learning and transfer learning. Character Error Rate(CER) is adopted as the criterion for the system. Finally, we combined the three systems with the language model, respectively, our best result dropped to 23.9% compared with the baseline of 28.7%.

pdf
Discussion on domain generalization in the cross-device speaker verification system
Wei-Ting Lin | Yu-Jia Zhang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In this paper, we use domain generalization to improve the performance of the cross-device speaker verification system. Based on a trainable speaker verification system, we use domain generalization algorithms to fine-tune the model parameters. First, we use the VoxCeleb2 dataset to train ECAPA-TDNN as a baseline model. Then, use the CHT-TDSV dataset and the following domain generalization algorithms to fine-tune it: DANN, CDNN, Deep CORAL. Our proposed system tests 10 different scenarios in the NSYSU-TDSV dataset, including a single device and multiple devices. Finally, in the scenario of multiple devices, the best equal error rate decreased from 18.39 in the baseline to 8.84. Successfully achieved cross-device identification on the speaker verification system.

pdf
RCRNN-based Sound Event Detection System with Specific Speech Resolution
Sung-Jen Huang | Yih-Wen Wang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%.

2020

pdf
NSYSU+CHT 團隊於 2020 遠場語者驗證比賽之語者驗證系統 (NSYSU+CHT Speaker Verification System for Far-Field Speaker Verification Challenge 2020)
Yu-Jia Zhang | Chia-Ping Chen | Shan-Wen Hsiao | Bo-Cheng Chan | Chung-li Lu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 25, Number 2, December 2020

pdf
Real-Time Single-Speaker Taiwanese-Accented Mandarin Speech Synthesis System
Yih-Wen Wang | Chia-Ping Chen
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf
NSYSU+CHT Speaker Verification System for Far-Field Speaker Verification Challenge 2020
Yu-Jia Zhang | Chia-Ping Chen
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

2019

pdf
即時中文語音合成系統 (Real-Time Mandarin Speech Synthesis System)
An-Chieh Cheng | Chia-Ping Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 24, Number 2, December 2019

pdf
即時中文語音合成系統(Real-Time Mandarin Speech Synthesis System)
An-Chieh Cheng | Chia-Ping Chen
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

pdf
以三元組損失微調時延神經網路語者嵌入函數之語者辨識系統(Time Delay Neural Network-based Speaker Embedding Function Fine-tuned with Triplet Loss for Distance-based Speaker Recognition)
Chih-Ting Yehn | Po-Chin Wang | Su-Yu Zhang | Chia-Ping Chen | Shan-Wen Hsiao | Bo-Cheng Chan | Chung-li Lu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2018

pdf bib
Sentiment Analysis on Social Network: Using Emoticon Characteristics for Twitter Polarity Classification
Chia-Ping Chen | Tzu-Hsuan Tseng | Tzu-Hsuan Yang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 23, Number 1, June 2018

pdf
結合卷積神經網路與遞迴神經網路於推文極性分類 (Combining Convolutional Neural Network and Recurrent Neural Network for Tweet Polarity Classification) [In Chinese]
Chih-Ting Yeh | Chia-Ping Chen
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING 2018)

pdf
deepSA2018 at SemEval-2018 Task 1: Multi-task Learning of Different Label for Affect in Tweets
Zi-Yuan Gao | Chia-Ping Chen
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our system implementation for subtask V-oc of SemEval-2018 Task 1: affect in tweets. We use multi-task learning method to learn shared representation, then learn the features for each task. There are five classification models in the proposed multi-task learning approach. These classification models are trained sequentially to learn different features for different classification tasks. In addition to the data released for SemEval-2018, we use datasets from previous SemEvals during system construction. Our Pearson correlation score is 0.638 on the official SemEval-2018 Task 1 test set.

2017

pdf
Using Teacher-Student Model For Emotional Speech Recognition[In Chinese]
Po-Wei Hsiao | Po-Chen Hsieh | Chia-Ping Chen
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

pdf
deepSA at SemEval-2017 Task 4: Interpolated Deep Neural Networks for Sentiment Analysis in Twitter
Tzu-Hsuan Yang | Tzu-Hsuan Tseng | Chia-Ping Chen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe our system implementation for sentiment analysis in Twitter. This system combines two models based on deep neural networks, namely a convolutional neural network (CNN) and a long short-term memory (LSTM) recurrent neural network, through interpolation. Distributed representation of words as vectors are input to the system, and the output is a sentiment class. The neural network models are trained exclusively with the data sets provided by the organizers of SemEval-2017 Task 4 Subtask A. Overall, this system has achieved 0.618 for the average recall rate, 0.587 for the average F1 score, and 0.618 for accuracy.

2016

pdf
以多層感知器辨識情緒於國台客語料庫 (Use Multilayer Perceptron To Recognize Emotion in Mandarin,Taiwanese and Hakka Database) [In Chinese]
Chia-Hsien Chan | Chia-Ping Chen
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

pdf
Support Super-Vector Machines in Automatic Speech Emotion Recognition
Chia-Ying Chen | Chia-Ping Chen
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

pdf
標記對於類神經語音情緒辨識系統辨識效果之影響(Effects of Label in Neural Speech Emotion Recognition System)[In Chinese]
Tung-Han Wu | Chia-Ping Chen
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

2014

pdf
台灣情緒語料庫建置與辨識 (An Emotional Speech Database in Taiwan: Collection and Recognition) [In Chinese]
Bo-Chang Chiou | Chia-Ping Chen
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

2013

pdf bib
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)
Hung-Duen Yang | Wen-Lian Hsu | Chia-Ping Chen
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf
基於Sphinx 可快速個人化行動數字語音辨識系統 (Quickly Personalizable Mobile Digit Speech Recognition System Based on Sphinx) [In Chinese]
Tsung-Peng Yen | Chia-Ping Chen
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf
基於時域上基週同步疊加法之歌聲合成系統 (Singing Voice Synthesis System Based on Time Domain-Pitch Synchronized Overlap and Add) [In Chinese]
Ming-Kuan Wu | Chia-Ping Chen
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 4, December 2013-Special Issue on Selected Papers from ROCLING XXV
Chia-Hui Chang | Chia-Ping Chen | Jia-Ching Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 4, December 2013-Special Issue on Selected Papers from ROCLING XXV

2012

pdf
應用串接方法於連續變化轉速之四行程引擎聲音合成 (Concatenation-based Method for the Synthesis of Engine Noise with Continuously Varying Speed) [In Chinese]
Ming-Kuan Wu | Chia-Ping Chen
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV
Liang-Chih Yu | Richard Tzong-Han Tsai | Chia-Ping Chen | Cheng-Zen Yang | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV

2011

pdf
Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
Wei-Bin Liang | Chung-Hsien Wu | Chia-Ping Chen
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf
Noise-Robust Speech Features Based on Cepstral Time Coefficients
Ja-Zang Yeh | Chia-Ping Chen
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf
A Framework for Machine Translation Output Combination
Yi-Chang Chen | Chia-Ping Chen
ROCLING 2009 Poster Papers

2006

pdf
Automatic Learning of Context-Free Grammar
Tai-Hung Chen | Chun-Han Tseng | Chia-Ping Chen
Proceedings of the 18th Conference on Computational Linguistics and Speech Processing

pdf bib
An Approach to Using the Web as a Live Corpus for Spoken Transliteration Name Access
Ming-Shun Lin | Chia-Ping Chen | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 3, September 2006: Special Issue on Selected Papers from ROCLING XVII

2005

pdf
An Approach of Using the Web as a Live Corpus for Spoken Transliteration Name Access
Ming-Shun Lin | Chia-Ping Chen | Hsin-Hsi Chen
Proceedings of the 17th Conference on Computational Linguistics and Speech Processing