Yu Tsao - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Yu Tsao

2025

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Siyin Wang | Wenyi Yu | Xianzhao Chen | Xiaohai Tian | Jun Zhang | Lu Lu | Yu Tsao | Junichi Yamagishi | Yuxuan Wang | Chao Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper explores a novel perspective to speech quality assessment by leveraging natural language descriptions, offering richer, more nuanced insights than traditional numerical scoring methods. Natural language feedback provides instructive recommendations and detailed evaluations, yet existing datasets lack the comprehensive annotations needed for this approach. To bridge this gap, we introduce QualiSpeech, a comprehensive low-level speech quality assessment dataset encompassing 11 key aspects and detailed natural language comments that include reasoning and contextual insights. Additionally, we propose the QualiSpeech Benchmark to evaluate the low-level speech understanding capabilities of auditory large language models (LLMs). Experimental results demonstrate that finetuned auditory LLMs can reliably generate detailed descriptions of noise and distortion, effectively identifying their types and temporal characteristics. The results further highlight the potential for incorporating reasoning to enhance the accuracy and reliability of quality assessments. The dataset can be found at https://huggingface.co/datasets/tsinghua-ee/QualiSpeech.

2023

Sound Processing for Cochlear Implants: The Journey of Innovation Toward Artificial Intelligence
Enoch Hsin-Ho Huang | Chao-Min Wu | Yu Tsao
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2022

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding
Chan-Jan Hsu | Hung-yi Lee | Yu Tsao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders’ success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.

Chinese Movie Dialogue Question Answering Dataset
Shang-Bao Luo | Cheng-Chung Fan | Kuan-Yu Chen | Yu Tsao | Hsin-Min Wang | Keh-Yih Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

This paper constructs a Chinese dialogue-based information-seeking question answering dataset CMDQA, which is mainly applied to the scenario of getting Chinese movie related information. It contains 10K QA dialogs (40K turns in total). All questions and background documents are compiled from the Wikipedia via an Internet crawler. The answers to the questions are obtained via extracting the corresponding answer spans within the related text passage. In CMDQA, in addition to searching related documents, pronouns are also added to the question to better mimic the real dialog scenario. This dataset can test the individual performance of the information retrieval, the question answering and the question re-writing modules. This paper also provides a baseline system and shows its performance on this dataset. The experiments elucidate that it still has a big gap to catch the human performance. This dataset thus provides enough challenge for the researcher to conduct related research.

2021

This paper presents a framework to answer the questions that require various kinds of inference mechanisms (such as Extraction, Entailment-Judgement, and Summarization). Most of the previous approaches adopt a rigid framework which handles only one inference mechanism. Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks. To alleviate the problems mentioned above, we propose a divide-and-conquer framework, which consists of a set of various answer generation modules, a dispatch module, and an aggregation module. The answer generation modules are designed to provide different inference mechanisms, the dispatch module is used to select a few appropriate answer generation modules to generate answer candidates, and the aggregation module is employed to select the final answer. We test our framework on the 2020 Formosa Grand Challenge Contest dataset. Experiments show that the proposed framework outperforms the state-of-the-art Roberta-large model by about 11.4%.

2019

Speech enhancement based on the integration of fully convolutional network, temporal lowpass filtering and spectrogram masking
Kuan-Yi Liu | Syu-Siang Wang | Yu Tsao | Jeih-weih Hung
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2018

WaveNet 聲碼器及其於語音轉換之應用 (WaveNet Vocoder and its Applications in Voice Conversion) [In Chinese]
Wen-Chin Huang | Chen-Chou Lo | Hsin-Te Hwang | Yu Tsao | Hsin-Min Wang
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING 2018)

2017

Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)
Lun-Wei Ku | Yu Tsao
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

多樣訊雜比之訓練語料於降噪自動編碼器其語音強化功能之初步研究 (A Preliminary Study of Various SNR-level Training Data in the Denoising Auto-encoder (DAE) Technique for Speech Enhancement) [In Chinese]
Shih-Kuang Lee | Syu-Siang Wang | Yu Tsao | Jeih-weih Hung
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese]
Yu-Ding Lu | Hung-Shin Lee | Yu Tsao | Hsin-Min Wang
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

改進的向量空間可適性濾波器用於聲學回聲消除 (Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm) [In Chinese]
Jin Li-You | Yu Tsao | Ying-Ren Chien
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

以語音能量特性發展即時語速偵測裝置-前導型研究 (Real-time monitoring device of phonation speed and volume based on speech energy: A pilot study) [In Chinese]
Chi-Te Wang | Feng-Chuan Lin | Wei-Zhung Zheng | Shih-Hau Fang | Yu Tsao | Ying-Hui Lai
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

以軟體為基礎建構語音增強系統使用者介面 (Development of a software-based User-Interface of Speech Enhancement System) [In Chinese]
Tao-Wei Wang | Yu Tsao | Ying-Hui Lai | Hsiang-Ping Hsu | Chia-Lung Wu
Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017)

改進的向量空間可適性濾波器用於聲學回聲消除 (Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm) [In Chinese]
Jin Li-You | Yu Tsao | Ying-Ren Chien
International Journal of Computational Linguistics & Chinese Language Processing, Volume 22, Number 2, December 2017-Special Issue on Selected Papers from ROCLING XXIX

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統 (A Replay Spoofing Detection System Based on Discriminative Autoencoders) [In Chinese]
Chia-Lung Wu | Hsiang-Ping Hsu | Yu-Ding Lu | Yu Tsao | Hung-Shin Lee | Hsin-Min Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 22, Number 2, December 2017-Special Issue on Selected Papers from ROCLING XXIX

2016

非負矩陣分解法於語音調變頻譜強化之研究(A study of enhancing the modulation spectrum of speech signals via nonnegative matrix factorization)[In Chinese]
Xu-Xiang Wang | Zhi-Hao Zheng | Yu Tsao | Jhih-Wei Hong
Proceedings of the 28th Conference on Computational Linguistics and Speech Processing (ROCLING 2016)

2015

語音增強基於小腦模型控制器(A Speech Enhancement System Based on Cerebellar Model Articulation Controller) [In Chinese]
Hao-Chun Chu | Jung-Hsi Lee | Shih-Hau Fang | Chih-Min Lin | Yun-Fan Chang | Yu Tsao
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)

類神經網路訓練結合環境群集及專家混合系統於強健性語音辨識(Automatic Speech Recognition using Neural Network based Acoustic Model with the Environment Clustering and Mixture of Experts Algorithms) [In Chinese]
Chia-Yung Hsu | Jia-Ching Wang | Yu Tsao
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)

2013

Semantic Naïve Bayes Classifier for Document Classification
How Jing | Yu Tsao | Kuan-Yu Chen | Hsin-Min Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

結合I-Vector 及深層神經網路之語者驗證系統 (Text-independent Speaker Verification using a Hybrid I-Vector/DNN Approach) [In Chinese]
Yun-Fan Chang | Yu Tsao | Shao-Hua Cheng | Kai-Hsuan Chan | Chia-Wei Liao | Wen-Tsung Chang
Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)

Co-authors

Venues