Yih-Wen Wang
2021
RCRNN-based Sound Event Detection System with Specific Speech Resolution
Sung-Jen Huang
|
Yih-Wen Wang
|
Chia-Ping Chen
|
Chung-Li Lu
|
Bo-Cheng Chan
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%.
2020
Real-Time Single-Speaker Taiwanese-Accented Mandarin Speech Synthesis System
Yih-Wen Wang
|
Chia-Ping Chen
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)
Search