Huaipeng Zhao
2018
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
|
Wanxiang Che
|
Huaipeng Zhao
|
Bing Qin
|
Ting Liu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Many natural language processing tasks can be modeled into structured prediction and solved as a search problem. In this paper, we distill an ensemble of multiple models trained with different initialization into a single model. In addition to learning to match the ensemble’s probability output on the reference states, we also use the ensemble to explore the search space and learn from the encountered states in the exploration. Experimental results on two typical search-based structured prediction tasks – transition-based dependency parsing and neural machine translation show that distillation can effectively improve the single model’s performance and the final model achieves improvements of 1.32 in LAS and 2.65 in BLEU score on these two tasks respectively over strong baselines and it outperforms the greedy structured prediction models in previous literatures.
2017
The HIT-SCIR System for End-to-End Parsing of Universal Dependencies
Wanxiang Che
|
Jiang Guo
|
Yuxuan Wang
|
Bo Zheng
|
Huaipeng Zhao
|
Yang Liu
|
Dechuan Teng
|
Ting Liu
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
This paper describes our system (HIT-SCIR) for the CoNLL 2017 shared task: Multilingual Parsing from Raw Text to Universal Dependencies. Our system includes three pipelined components: tokenization, Part-of-Speech (POS) tagging and dependency parsing. We use character-based bidirectional long short-term memory (LSTM) networks for both tokenization and POS tagging. Afterwards, we employ a list-based transition-based algorithm for general non-projective parsing and present an improved Stack-LSTM-based architecture for representing each transition state and making predictions. Furthermore, to parse low/zero-resource languages and cross-domain data, we use a model transfer approach to make effective use of existing resources. We demonstrate substantial gains against the UDPipe baseline, with an average improvement of 3.76% in LAS of all languages. And finally, we rank the 4th place on the official test sets.