Yanzhi Tian
2022
BIT-Xiaomi’s System for AutoSimTrans 2022
Mengge Liu
|
Xiang Li
|
Bao Chen
|
Yanzhi Tian
|
Tianwei Lan
|
Silin Li
|
Yuhang Guo
|
Jian Luan
|
Bin Wang
Proceedings of the Third Workshop on Automatic Simultaneous Translation
This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.
Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Data Augmentation
Yanzhi Tian
|
Yuhang Guo
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
We attended the EvaHan2022 ancient Chinese word segmentation and Part-of-Speech (POS) tagging evaluation. We regard the Chinese word segmentation and POS tagging as sequence tagging tasks. Our system is based on a BERT-BiLSTM-CRF model which is trained on the data provided by the EvaHan2022 evaluation. Besides, we also employ data augmentation techniques to enhance the performance of our model. On the Test A and Test B of the evaluation, the F1 scores of our system achieve 94.73% and 90.93% for the word segmentation, 89.19% and 83.48% for the POS tagging.
Search
Co-authors
- Yuhang Guo 2
- Mengge Liu 1
- Xiang Li (李翔) 1
- Bao Chen 1
- Tianwei Lan 1
- show all...