Khe Chai Sim

Also published as: Khe-Chai Sim


2024

pdf
Massive End-to-end Speech Recognition Models with Time Reduction
Weiran Wang | Rohit Prabhavalkar | Haozhe Shan | Zhong Meng | Dongseong Hwang | Qiujia Li | Khe Chai Sim | Bo Li | James Qin | Xingyu Cai | Adam Stooke | Chengjian Zheng | Yanzhang He | Tara Sainath | Pedro Moreno Mengibar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We investigate massive end-to-end automatic speech recognition (ASR) models with efficiency improvements achieved by time reduction. The encoders of our models use the neural architecture of Google’s universal speech model (USM), with additional funnel pooling layers to significantly reduce the frame rate and speed up training and inference. We also explore a few practical methods to mitigate potential accuracy loss due to time reduction, while enjoying most efficiency gain. Our methods are demonstrated to work with both Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), with up to 2B model parameters, and over two domains. For a large-scale voice search recognition task, we perform extensive studies on vocabulary size, time reduction strategy, and its generalization performance on long-form test sets, and show that a 900M RNN-T is very tolerant to severe time reduction, with as low encoder output frame rate as 640ms. We also provide ablation studies on the Librispeech benchmark for important training hyperparameters and architecture designs, in training 600M RNN-T models at the frame rate of 160ms.

2014

pdf
A Beam-Search Decoder for Disfluency Detection
Xuancong Wang | Hwee Tou Ng | Khe Chai Sim
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Combining Punctuation and Disfluency Prediction: An Empirical Study
Xuancong Wang | Khe Chai Sim | Hwee Tou Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf
Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
Khe Chai Sim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2008

pdf
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

2007

pdf
Semantic Transliteration of Personal Names
Haizhou Li | Khe Chai Sim | Jin-Shea Kuo | Minghui Dong
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics