Xiaohui Wang


LightSeq: A High Performance Inference Library for Transformers
Xiaohui Wang | Ying Xiong | Yang Wei | Mingxuan Wang | Lei Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Transformer and its variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose , a highly efficient inference library for models in the Transformer family. includes a series of GPU optimization techniques to both streamline the computation of Transformer layers and reduce memory footprint. supports models trained using PyTorch and Tensorflow. Experimental results on standard machine translation benchmarks show that achieves up to 14x speedup compared with TensorFlow and 1.4x speedup compared with , a concurrent CUDA implementation. The code will be released publicly after the review.


基于Self-Attention的句法感知汉语框架语义角色标注(Syntax-Aware Chinese Frame Semantic Role Labeling Based on Self-Attention)
Xiaohui Wang (王晓晖) | Ru Li (李茹) | Zhiqiang Wang (王智强) | Qinghua Chai (柴清华) | Xiaoqi Han (韩孝奇)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

框架语义角色标注(Frame Semantic Role Labeling, FSRL)是基于FrameNet标注体系的语义分析任务。语义角色标注通常对句法有很强的依赖性,目前的语义角色标注模型大多基于双向长短时记忆网络Bi-LSTM,虽然可以获取句子中的长距离依赖信息,但无法很好获取句子中的句法信息。因此,引入self-attention机制来捕获句子中每个词的句法信息。实验结果表明,该模型在CFN(Chinese FrameNet,汉语框架网)数据集上的F1达到83.77%,提升了近11%。


WAPUSK20 - A Database for Robust Audiovisual Speech Recognition
Alexander Vorwerk | Xiaohui Wang | Dorothea Kolossa | Steffen Zeiler | Reinhold Orglmeister
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. In order to develop reliable AVSR systems, appropriate simultaneously recorded speech and video data is needed. In this paper, we will introduce a corpus (WAPUSK20) that consists of audiovisual data of 20 speakers uttering 100 sentences each with four channels of audio and a stereoscopic video. The latter is intended to support more accurate lip tracking and the development of stereo data based normalization techniques for greater robustness of the recognition results. The sentence design has been adopted from the GRID corpus that has been widely used for AVSR experiments. Recordings have been made under acoustically realistic conditions in a usual office room. Affordable hardware equipment has been used, such as a pre-calibrated stereo camera and standard PC components. The software written to create this corpus was designed in MATLAB with help of hardware specific software provided by the hardware manufacturers and freely available open source software.