Bo-Cheng Chan


2022

pdf
Investigation of feature processing modules and attention mechanisms in speaker verification system
Ting-Wei Chen | Wei-Ting Lin | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan | Yu-Han Cheng | Hsiang-Feng Chuang | Wei-Yu Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.

pdf
Lightweight Sound Event Detection Model with RepVGG Architecture
Chia-Chuan Liu | Sung-Jen Huang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan | Yu-Han Cheng | Hsiang-Feng Chuang | Wei-Yu Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

In this paper, we proposed RepVGGRNN, which is a light weight sound event detection model. We use RepVGG convolution blocks in the convolution part to improve performance, and re-parameterize the RepVGG blocks after the model is trained to reduce the parameters of the convolution layers. To further improve the accuracy of the model, we incorporated both the mean teacher method and knowledge distillation to train the lightweight model. The proposed system achieves PSDS (Polyphonic sound event detection score)-scenario 1, 2 of 40.8% and 67.7% outperforms the baseline system of 34.4% and 57.2% on the DCASE 2022 Task4 validation dataset. The quantity of the parameters in the proposed system is about 49.6K, only 44.6% of the baseline system.

2021

pdf
Discussion on domain generalization in the cross-device speaker verification system
Wei-Ting Lin | Yu-Jia Zhang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In this paper, we use domain generalization to improve the performance of the cross-device speaker verification system. Based on a trainable speaker verification system, we use domain generalization algorithms to fine-tune the model parameters. First, we use the VoxCeleb2 dataset to train ECAPA-TDNN as a baseline model. Then, use the CHT-TDSV dataset and the following domain generalization algorithms to fine-tune it: DANN, CDNN, Deep CORAL. Our proposed system tests 10 different scenarios in the NSYSU-TDSV dataset, including a single device and multiple devices. Finally, in the scenario of multiple devices, the best equal error rate decreased from 18.39 in the baseline to 8.84. Successfully achieved cross-device identification on the speaker verification system.

pdf
RCRNN-based Sound Event Detection System with Specific Speech Resolution
Sung-Jen Huang | Yih-Wen Wang | Chia-Ping Chen | Chung-Li Lu | Bo-Cheng Chan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%.

2020

pdf
NSYSU+CHT 團隊於 2020 遠場語者驗證比賽之語者驗證系統 (NSYSU+CHT Speaker Verification System for Far-Field Speaker Verification Challenge 2020)
Yu-Jia Zhang | Chia-Ping Chen | Shan-Wen Hsiao | Bo-Cheng Chan | Chung-li Lu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 25, Number 2, December 2020

2019

pdf
以三元組損失微調時延神經網路語者嵌入函數之語者辨識系統(Time Delay Neural Network-based Speaker Embedding Function Fine-tuned with Triplet Loss for Distance-based Speaker Recognition)
Chih-Ting Yehn | Po-Chin Wang | Su-Yu Zhang | Chia-Ping Chen | Shan-Wen Hsiao | Bo-Cheng Chan | Chung-li Lu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)