Yuhang Guo


2022

pdf
BIT-Xiaomi’s System for AutoSimTrans 2022
Mengge Liu | Xiang Li | Bao Chen | Yanzhi Tian | Tianwei Lan | Silin Li | Yuhang Guo | Jian Luan | Bin Wang
Proceedings of the Third Workshop on Automatic Simultaneous Translation

This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.

pdf
Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Data Augmentation
Yanzhi Tian | Yuhang Guo
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

We attended the EvaHan2022 ancient Chinese word segmentation and Part-of-Speech (POS) tagging evaluation. We regard the Chinese word segmentation and POS tagging as sequence tagging tasks. Our system is based on a BERT-BiLSTM-CRF model which is trained on the data provided by the EvaHan2022 evaluation. Besides, we also employ data augmentation techniques to enhance the performance of our model. On the Test A and Test B of the evaluation, the F1 scores of our system achieve 94.73% and 90.93% for the word segmentation, 89.19% and 83.48% for the POS tagging.

pdf
The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
Bao Guo | Mengge Liu | Wen Zhang | Hexuan Chen | Chang Mu | Xiang Li | Jianwei Cui | Bin Wang | Yuhang Guo
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of extttwait-k strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2 extasciitilde3 BLEU scores under different latency regimes.

2021

pdf bib
BIT’s system for AutoSimulTrans2021
Mengge Liu | Shuoying Chen | Minqin Li | Zhipeng Wang | Yuhang Guo
Proceedings of the Second Workshop on Automatic Simultaneous Translation

In this paper we introduce our Chinese-English simultaneous translation system participating in AutoSimulTrans2021. In simultaneous translation, translation quality and delay are both important. In order to reduce the translation delay, we cut the streaming-input source sentence into segments and translate the segments before the full sentence is received. In order to obtain high-quality translations, we pre-train a translation model with adequate corpus and fine-tune the model with domain adaptation and sentence length adaptation. The experimental results on the evaluation data show that our system performs better than the baseline system.

2020

pdf
BIT’s system for the AutoSimTrans 2020
Minqin Li | Haodong Cheng | Yuanjie Wang | Sijia Zhang | Liting Wu | Yuhang Guo
Proceedings of the First Workshop on Automatic Simultaneous Translation

This paper describes our machine translation systems for the streaming Chinese-to-English translation task of AutoSimTrans 2020. We present a sentence length based method and a sentence boundary detection model based method for the streaming input segmentation. Experimental results of the transcription and the ASR output translation on the development data sets show that the translation system with the detection model based method outperforms the one with the length based method in BLEU score by 1.19 and 0.99 respectively under similar or better latency.

2017

pdf
BIT at SemEval-2017 Task 1: Using Semantic Information Space to Evaluate Semantic Textual Similarity
Hao Wu | Heyan Huang | Ping Jian | Yuhang Guo | Chao Su
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper presents three systems for semantic textual similarity (STS) evaluation at SemEval-2017 STS task. One is an unsupervised system and the other two are supervised systems which simply employ the unsupervised one. All our systems mainly depend on the (SIS), which is constructed based on the semantic hierarchical taxonomy in WordNet, to compute non-overlapping information content (IC) of sentences. Our team ranked 2nd among 31 participating teams by the primary score of Pearson correlation coefficient (PCC) mean of 7 tracks and achieved the best performance on Track 1 (AR-AR) dataset.

pdf
A Parallel Recurrent Neural Network for Language Modeling with POS Tags
Chao Su | Heyan Huang | Shumin Shi | Yuhang Guo | Hao Wu
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2013

pdf
Microblog Entity Linking by Leveraging Extra Posts
Yuhang Guo | Bing Qin | Ting Liu | Sheng Li
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2011

pdf
A Graph-based Method for Entity Linking
Yuhang Guo | Wanxiang Che | Ting Liu | Sheng Li
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation
Yuhang Guo | Wanxiang Che | Wei He | Ting Liu | Sheng Li
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf
Multilingual Dependency-based Syntactic and Semantic Parsing
Wanxiang Che | Zhenghua Li | Yongqiang Li | Yuhang Guo | Bing Qin | Ting Liu
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2007

pdf
HIT-IR-WSD: A WSD System for English Lexical Sample Task
Yuhang Guo | Wanxiang Che | Yuxuan Hu | Wei Zhang | Ting Liu
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)