Abstract
Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in speech enhancement. We present employing sub-signals dwelled in multiple acoustic frequency bands in time domain and integrating them into a unified feature set. We propose using the discrete wavelet transform (DWT) to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is the bi-projection fusion (BPF). In short, BPF exploits the sigmoid function to create ratio masks for two feature sources. The concatenation of fused DWT features and time features serves as the encoder output of a celebrated SE framework, fully-convolutional time-domain audio separation network (Conv-TasNet), to estimate the mask and then produce the enhanced time-domain utterances. The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks. The experimental results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet in speech enhancement.- Anthology ID:
- 2022.rocling-1.12
- Volume:
- Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
- Month:
- November
- Year:
- 2022
- Address:
- Taipei, Taiwan
- Editors:
- Yung-Chun Chang, Yi-Chin Huang
- Venue:
- ROCLING
- SIG:
- Publisher:
- The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
- Note:
- Pages:
- 92–99
- Language:
- Chinese
- URL:
- https://aclanthology.org/2022.rocling-1.12
- DOI:
- Cite (ACL):
- Yan-Tong Chen, Zong-Tai Wu, and Jeih-Weih Hung. 2022. A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 92–99, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
- Cite (Informal):
- A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model (Chen et al., ROCLING 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2022.rocling-1.12.pdf