Bin Ma


2021

pdf
A Unified Speaker Adaptation Approach for ASR
Yingzhu Zhao | Chongjia Ni | Cheung-Chi Leung | Shafiq Joty | Eng Siong Chng | Bin Ma
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52% word error rate (WER) reduction on general speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58% relative WER reduction, and surpasses the finetuning method by up to relative 2.54%. Besides, with extremely low-resource adaptation data (e.g., 1 utterance), our method could improve the WER by relative 6.53% with only a few epochs of training.

2018

pdf
Alibaba Speech Translation Systems for IWSLT 2018
Nguyen Bach | Hongjie Chen | Kai Fan | Cheung-Chi Leung | Bo Li | Chongjia Ni | Rong Tong | Pei Zhang | Boxing Chen | Bin Ma | Fei Huang
Proceedings of the 15th International Conference on Spoken Language Translation

This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018. In order to improve ASR performance, multiple ASR models including conventional and end-to-end models are built, then we apply model fusion in the final step. ASR pre and post-processing techniques such as speech segmentation, punctuation insertion, and sentence splitting are found to be very useful for MT. We also employed most techniques that have proven effective during the WMT 2018 evaluation, such as BPE, back translation, data selection, model ensembling and reranking. These ASR and MT techniques, combined, improve the speech translation quality significantly.

2013

pdf
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
Xiaoming Lu | Lei Xie | Cheung-Chi Leung | Bin Ma | Haizhou Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf
Thread Cleaning and Merging for Microblog Topic Detection
Jianfeng Zhang | Yunqing Xia | Bin Ma | Jianmin Yao | Yu Hong
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Using Cross-Entity Inference to Improve Event Extraction
Yu Hong | Jianfeng Zhang | Bin Ma | Jianmin Yao | Guodong Zhou | Qiaoming Zhu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2008

pdf
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

2006

pdf
A Comparative Study of Four Language Identification Systems
Bin Ma | Haizhou Li
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 2, June 2006

2005

pdf
A Phonotactic Language Model for Spoken Language Identification
Haizhou Li | Bin Ma
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)