Wei He


2018

pdf
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
Wei He | Kai Liu | Jing Liu | Yajuan Lyu | Shiqi Zhao | Xinyan Xiao | Yuan Liu | Yizhong Wang | Hua Wu | Qiaoqiao She | Xuan Liu | Tian Wu | Haifeng Wang
Proceedings of the Workshop on Machine Reading for Question Answering

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.

pdf
Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification
Yizhong Wang | Kai Liu | Jing Liu | Wei He | Yajuan Lyu | Hua Wu | Sujian Li | Haifeng Wang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Machine reading comprehension (MRC) on real web data usually requires the machine to answer a question by analyzing multiple passages retrieved by search engine. Compared with MRC on a single passage, multi-passage MRC is more challenging, since we are likely to get multiple confusing answer candidates from different passages. To address this problem, we propose an end-to-end neural model that enables those answer candidates from different passages to verify each other based on their content representations. Specifically, we jointly train three modules that can predict the final answer based on three factors: the answer boundary, the answer content and the cross-passage answer verification. The experimental results show that our method outperforms the baseline by a large margin and achieves the state-of-the-art performance on the English MS-MARCO dataset and the Chinese DuReader dataset, both of which are designed for MRC in real-world settings.

pdf
Answer-focused and Position-aware Neural Question Generation
Xingwu Sun | Jing Liu | Yajuan Lyu | Wei He | Yanjun Ma | Shi Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we focus on the problem of question generation (QG). Recent neural network-based approaches employ the sequence-to-sequence model which takes an answer and its context as input and generates a relevant question as output. However, we observe two major issues with these approaches: (1) The generated interrogative words (or question words) do not match the answer type. (2) The model copies the context words that are far from and irrelevant to the answer, instead of the words that are close and relevant to the answer. To address these two issues, we propose an answer-focused and position-aware neural question generation model. (1) By answer-focused, we mean that we explicitly model question word generation by incorporating the answer embedding, which can help generate an interrogative word matching the answer type. (2) By position-aware, we mean that we model the relative distance between the context words and the answer. Hence the model can be aware of the position of the context words when copying them to generate a question. We conduct extensive experiments to examine the effectiveness of our model. The experimental results show that our model significantly improves the baseline and outperforms the state-of-the-art system.

2016

pdf
Chinese Poetry Generation with Planning based Neural Network
Zhe Wang | Wei He | Hua Wu | Haiyang Wu | Wei Li | Haifeng Wang | Enhong Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.

pdf
Minimum Risk Training for Neural Machine Translation
Shiqi Shen | Yong Cheng | Zhongjun He | Wei He | Hua Wu | Maosong Sun | Yang Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Semi-Supervised Learning for Neural Machine Translation
Yong Cheng | Wei Xu | Zhongjun He | Wei He | Hua Wu | Maosong Sun | Yang Liu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
Rule-Based Weibo Messages Sentiment Polarity Classification towards Given Topics
Hongzhao Zhou | Yonglin Teng | Min Hou | Wei He | Hongtao Zhu | Xiaolin Zhu | Yanfei Mu
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

pdf
Multi-Task Learning for Multiple Language Translation
Daxiang Dong | Hua Wu | Wei He | Dianhai Yu | Haifeng Wang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf
Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model
Haiyang Wu | Daxiang Dong | Xiaoguang Hu | Dianhai Yu | Wei He | Hua Wu | Haifeng Wang | Ting Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf
Improve SMT Quality with Automatically Extracted Paraphrase Rules
Wei He | Hua Wu | Haifeng Wang | Ting Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf
Enriching SMT Training Data via Paraphrasing
Wei He | Shiqi Zhao | Haifeng Wang | Ting Liu
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
CMDMC: A Diachronic Digital Museum of Chinese Mandarin
Min Hou | Yu Zou | Yonglin Teng | Wei He | Yan Wang | Jun Liu | Jiyuan Wu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf
HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation
Yuhang Guo | Wanxiang Che | Wei He | Ting Liu | Sheng Li
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf
Dependency Based Chinese Sentence Realization
Wei He | Haifeng Wang | Yuqing Guo | Ting Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2005

pdf
Automated Generalization of Phrasal Paraphrases from the Web
Weigang Li | Ting Liu | Yu Zhang | Sheng Li | Wei He
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)