Jeih-weih Hung

Also published as: Jeih-Weih Hung


2022

pdf
A Preliminary Study of the Application of Discrete Wavelet Transform Features in Conv-TasNet Speech Enhancement Model
Yan-Tong Chen | Zong-Tai Wu | Jeih-Weih Hung
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances. This study primarily investigates how to extract information from time-domain utterances to create more effective features in speech enhancement. We present employing sub-signals dwelled in multiple acoustic frequency bands in time domain and integrating them into a unified feature set. We propose using the discrete wavelet transform (DWT) to decompose each input frame signal to obtain sub-band signals, and a projection fusion process is performed on these signals to create the ultimate features. The corresponding fusion strategy is the bi-projection fusion (BPF). In short, BPF exploits the sigmoid function to create ratio masks for two feature sources. The concatenation of fused DWT features and time features serves as the encoder output of a celebrated SE framework, fully-convolutional time-domain audio separation network (Conv-TasNet), to estimate the mask and then produce the enhanced time-domain utterances. The evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks. The experimental results reveal that the proposed method achieves higher speech quality and intelligibility than the original Conv-TasNet that uses time features only, indicating that the fusion of DWT features created from the input utterances can benefit time features to learn a superior Conv-TasNet in speech enhancement.

pdf
Exploiting the compressed spectral loss for the learning of the DEMUCS speech enhancement network
Chi-En Dai | Qi-Wei Hong | Jeih-Weih Hung
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

This study aims to improve a highly effective speech enhancement technique, DEMUCS, by revising the respective loss function in learning. DEMUCS, developed by Facebook Team, is built on the Wave-UNet and consists of convolutional layer encoding and decoding blocks with an LSTM layer in between. Although DEMUCS processes the input speech utterance purely in the time (wave) domain, the applied loss function consists of wave-domain L1 distance and multi-scale shorttime-Fourier-transform (STFT) loss. That is, both time- and frequency-domain features are taken into consideration in the learning of DEMUCS. In this study, we present revising the STFT loss in DEMUCS by employing the compressed magnitude spectrogram. The compression is done by either the power-law operation with a positive exponent less than one, or the logarithmic operation. We evaluate the presented novel framework on the VoiceBank-DEMAND database and task. The preliminary experimental results suggest that DEMUCS containing the power-law compressed magnitude spectral loss outperforms the original DEMUCS by providing the test utterances with higher objective quality and intelligibility scores (PESQ and STOI). Relatively, the logarithm compressed magnitude spectral loss does not benefit DEMUCS. Therefore, we reveal that DEMUCS can be further improved by properly revising the STFT terms of its loss function.

2021

pdf
Employing low-pass filtered temporal speech features for the training of ideal ratio mask in speech enhancement
Yan-Tong Chen | Zi-Qiang Lin | Jeih-Weih Hung
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

The masking-based speech enhancement method pursues a multiplicative mask that applies to the spectrogram of input noise-corrupted utterance, and a deep neural network (DNN) is often used to learn the mask. In particular, the features commonly used for automatic speech recognition can serve as the input of the DNN to learn the well-behaved mask that significantly reduce the noise distortion of processed utterances. This study proposes to preprocess the input speech features for the ideal ratio mask (IRM)-based DNN by lowpass filtering in order to alleviate the noise components. In particular, we employ the discrete wavelet transform (DWT) to decompose the temporal speech feature sequence and scale down the detail coefficients, which correspond to the high-pass portion of the sequence. Preliminary experiments conducted on a subset of TIMIT corpus reveal that the proposed method can make the resulting IRM achieve higher speech quality and intelligibility for the babble noise-corrupted signals compared with the original IRM, indicating that the lowpass filtered temporal feature sequence can learn a superior IRM network for speech enhancement.

pdf
使用低通時序列語音特徵訓練理想比率遮罩法之語音強化 (Employing Low-Pass Filtered Temporal Speech Features for the Training of Ideal Ratio Mask in Speech Enhancement)
Yan-Tong Chen | Jeih-weih Hung
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

2020

pdf
基於深度聲學模型其狀態精確度最大化之強健語音特徵擷取的初步研究 (The Preliminary Study of Robust Speech Feature Extraction based on Maximizing the Accuracy of States in Deep Acoustic Models)
Li-Chia Chang | Jeih-weih Hung
International Journal of Computational Linguistics & Chinese Language Processing, Volume 25, Number 2, December 2020

pdf
The preliminary study of robust speech feature extraction based on maximizing the accuracy of states in deep acoustic models
Li-chia Chang | Jeih-weih Hung
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf
Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition
Fu-An Chao | Jeih-weih Hung | Berlin Chen
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

2019

pdf
Speech enhancement based on the integration of fully convolutional network, temporal lowpass filtering and spectrogram masking
Kuan-Yi Liu | Syu-Siang Wang | Yu Tsao | Jeih-weih Hung
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2017

pdf
多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]
Shih-Kuang Lee | Syu-Siang Wang | Yu Tsao | Jeih-weih Hung
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

2013

pdf bib
分頻式調變頻譜分解於強健性語音辨識 (Sub-band modulation spectrum factorization in robust speech recognition) [In Chinese]
Hao-teng Fan | Yi-zhang Cai | Jeih-weih Hung
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf
雜訊環境下應用線性估測編碼於特徵時序列之強健性語音辨識 (Employing linear prediction coding in feature time sequences for robust speech recognition in noisy environments) [In Chinese]
Hao-teng Fan | Wen-yu Tseng | Jeih-weih Hung
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

pdf
雜訊環境下應用線性估測編碼於特徵時序列之強健性語音辨識 (Employing Linear Prediction Coding in Feature Time Sequences for Robust Speech Recognition in Noisy Environments) [In Chinese]
Hao-teng Fan | Wen-yu Tseng | Jeih-weih Hung
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 4, December 2013-Special Issue on Selected Papers from ROCLING XXV

2012

pdf bib
改良式統計圖等化法強鍵性語音辨識之研究 (Improved Histogram Equalization Methods for Robust Speech Recognition) [In Chinese]
Hsin-Ju Hsieh | Jeih-weih Hung | Berlin Chen
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf
語音辨識使用統計圖等化方法 (Speech Recognition Leveraging Histogram Equalization Methods) [In Chinese]
Hsin-Ju Hsieh | Jeih-weih Hung | Berlin Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV

2011

pdf bib
Compensating the Speech Features via Discrete Cosine Transform for Robust Speech Recognition (基於離散餘弦轉換之語音特徵的強健性補償法)
Hsin-Ju Hsieh | Wen-hsiang Tu | Jeih-weih Hung
Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011)

pdf bib
機率式調變頻譜分解於強健性語音辨識 (Probabilistic Modulation Spectrum Factorization for Robust Speech Recognition) [In Chinese]
Wen-Yi Chu | Yu-Chen Kao | Berlin Chen | Jeih-Weih Hung
ROCLING 2011 Poster Papers

2010

bib
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)
Shih-Hung Wu | Jeih-weih Hung
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

pdf
最小變異數調變頻譜濾波器於強健性語音辨識之研究 (A Study of Minimum Variance Modulation Filter for Robust Speech Recognition) [In Chinese]
Ren-hau Hsieh | Hao-teng Fan | Jeih-weih Hung
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

pdf
進階式調變頻譜補償法於強健性語音辨識之研究 (Advanced Modulation Spectrum Compensation Techniques for Robust Speech Recognition) [In Chinese]
Wei-Jeih Yeh | Wen-hsiang Tu | Jeih-weih Hung
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010)

2009

bib
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing
June-Jei Kuo | Jeih-Weih Hung
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf
強健性語音辨識中分頻段調變頻譜補償之研究 (A Study of Sub-band Modulation Spectrum Compensation for Robust Speech Recognition) [In Chinese]
Sheng-yuan Huang | Wen-hsiang Tu | Jeih-weih Hung
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf
強健性語音辨識中基於小波轉換之分頻統計補償技術的研究 (A Study of Sub-band Feature Statistics Compensation Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition) [In Chinese]
Hao-teng Fan | Wen-Hsiang Tu | Jeih-weih Hung
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf
併合式倒頻譜統計正規化技術於強健性語音辨識之研究 (A Study of Hybrid-based Cepstral Statistics Normalization Techniques for Robust Speech Recognition) [In Chinese]
Guan-min He | Wen-Hsiang Tu | Jeih-weih Hung
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf
Study of Associative Cepstral Statistics Normalization Techniques for Robust Speech Recognition in Additive Noise Environments
Wen-Hsiang Tu | Jeih-weih Hung
International Journal of Computational Linguistics & Chinese Language Processing, Volume 14, Number 1, March 2009

2008

pdf
調變頻譜正規化法使用於強健語音辨識之研究 (Study of Modulation Spectrum Normalization Techniques for Robust Speech Recognition) [In Chinese]
Chih-Cheng Wang | Wen-hsiang Tu | Jeih-weih Hung
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing

pdf
強健性語音辨識中能量相關特徵之改良式正規化技術的研究 (Study of the Improved Normalization Techniques of Energy-Related Features for Robust Speech Recognition) [In Chinese]
Chi-an Pan | Wen-hsiang Tu | Jeih-weih Hung
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing

pdf
組合式倒頻譜統計正規化法於強健性語音辨識之研究 (Associative Cepstral Statistics Normalization Techniques for Robust Speech Recognition) [In Chinese]
Wen-hsiang Tu | Kuang-chieh Wu | Jeih-weih Hung
Proceedings of the 20th Conference on Computational Linguistics and Speech Processing

2007

pdf
加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識 (Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments) [In Chinese]
Tsung-hsueh Hsieh | Jeih-weih Hung
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing

pdf
端點偵測技術在強健語音參數擷取之研究 (Study of the Voice Activity Detection Techniques for Robust Speech Feature Extraction) [In Chinese]
Wen-Hsiang Tu | Jeih-weih Hung
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing