Khe Chai Sim
Also published as: Khe-Chai Sim
2024
Massive End-to-end Speech Recognition Models with Time Reduction
Weiran Wang | Rohit Prabhavalkar | Haozhe Shan | Zhong Meng | Dongseong Hwang | Qiujia Li | Khe Chai Sim | Bo Li | James Qin | Xingyu Cai | Adam Stooke | Chengjian Zheng | Yanzhang He | Tara Sainath | Pedro Moreno Mengibar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Weiran Wang | Rohit Prabhavalkar | Haozhe Shan | Zhong Meng | Dongseong Hwang | Qiujia Li | Khe Chai Sim | Bo Li | James Qin | Xingyu Cai | Adam Stooke | Chengjian Zheng | Yanzhang He | Tara Sainath | Pedro Moreno Mengibar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We investigate massive end-to-end automatic speech recognition (ASR) models with efficiency improvements achieved by time reduction. The encoders of our models use the neural architecture of Google’s universal speech model (USM), with additional funnel pooling layers to significantly reduce the frame rate and speed up training and inference. We also explore a few practical methods to mitigate potential accuracy loss due to time reduction, while enjoying most efficiency gain. Our methods are demonstrated to work with both Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), with up to 2B model parameters, and over two domains. For a large-scale voice search recognition task, we perform extensive studies on vocabulary size, time reduction strategy, and its generalization performance on long-form test sets, and show that a 900M RNN-T is very tolerant to severe time reduction, with as low encoder output frame rate as 640ms. We also provide ablation studies on the Librispeech benchmark for important training hyperparameters and architecture designs, in training 600M RNN-T models at the frame rate of 160ms.
2014
A Beam-Search Decoder for Disfluency Detection
Xuancong Wang | Hwee Tou Ng | Khe Chai Sim
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Xuancong Wang | Hwee Tou Ng | Khe Chai Sim
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Combining Punctuation and Disfluency Prediction: An Empirical Study
Xuancong Wang | Khe Chai Sim | Hwee Tou Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Xuancong Wang | Khe Chai Sim | Hwee Tou Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2012
Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
Khe Chai Sim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Khe Chai Sim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2008
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation