Hao Wang


Toward Knowledge-Enriched Conversational Recommendation Systems
Tong Zhang | Yong Liu | Boyang Li | Peixiang Zhong | Chen Zhang | Hao Wang | Chunyan Miao
Proceedings of the 4th Workshop on NLP for Conversational AI

Conversational Recommendation Systems recommend items through language based interactions with users.In order to generate naturalistic conversations and effectively utilize knowledge graphs (KGs) containing background information, we propose a novel Bag-of-Entities loss, which encourages the generated utterances to mention concepts related to the item being recommended, such as the genre or director of a movie. We also propose an alignment loss to further integrate KG entities into the response generation network. Experiments on the large-scale REDIAL dataset demonstrate that the proposed system consistently outperforms state-of-the-art baselines.

IMCI: Integrate Multi-view Contextual Information for Fact Extraction and Verification
Hao Wang | Yangguang Li | Zhen Huang | Yong Dou
Proceedings of the 29th International Conference on Computational Linguistics

With the rapid development of automatic fake news detection technology, fact extraction and verification (FEVER) has been attracting more attention. The task aims to extract the most related fact evidences from millions of open-domain Wikipedia documents and then verify the credibility of corresponding claims. Although several strong models have been proposed for the task and they have made great process, we argue that they fail to utilize multi-view contextual information and thus cannot obtain better performance. In this paper, we propose to integrate multi-view contextual information (IMCI) for fact extraction and verification. For each evidence sentence, we define two kinds of context, i.e. intra-document context and inter-document context. Intra-document context consists of the document title and all the other sentences from the same document. Inter-document context consists of all other evidences which may come from different documents. Then we integrate the multi-view contextual information to encode the evidence sentences to handle the task. Our experimental results on FEVER 1.0 shared task show that our IMCI framework makes great progress on both fact extraction and verification, and achieves state-of-the-art performance with a winning FEVER score of 73.96% and label accuracy of 77.25% on the online blind test set. We also conduct ablation study to detect the impact of multi-view contextual information.

R2F: A General Retrieval, Reading and Fusion Framework for Document-level Natural Language Inference
Hao Wang | Yixin Cao | Yangguang Li | Zhen Huang | Kun Wang | Jing Shao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Document-level natural language inference (DOCNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents. Current datasets and baselines largely follow sentence-level settings, but fail to address the issues raised by longer documents. In this paper, we establish a general solution, named Retrieval, Reading and Fusion (R2F) framework, and a new setting, by analyzing the main challenges of DOCNLI: interpretability, long-range dependency, and cross-sentence inference. The basic idea of the framework is to simplify document-level task into a set of sentence-level tasks, and improve both performance and interpretability with the power of evidence. For each hypothesis sentence, the framework retrieves evidence sentences from the premise, and reads to estimate its credibility. Then the sentence-level results are fused to judge the relationship between the documents. For the setting, we contribute complementary evidence and entailment label annotation on hypothesis sentences, for interpretability study. Our experimental results show that R2F framework can obtain state-of-the-art performance and is robust for diverse evidence retrieval methods. Moreover, it can give more interpretable prediction results. Our model and code are released at https://github.com/phoenixsecularbird/R2F.


pdf bib
融合零指代识别的篇章级机器翻译(Context-aware Machine Translation Integrating Zero Pronoun Recognition)
Hao Wang (汪浩) | Junhui Li (李军辉) | Zhengxian Gong (贡正仙)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



Bayes-enhanced Lifelong Attention Networks for Sentiment Classification
Hao Wang | Shuai Wang | Sahisnu Mazumder | Bing Liu | Yan Yang | Tianrui Li
Proceedings of the 28th International Conference on Computational Linguistics

The classic deep learning paradigm learns a model from the training data of a single task and the learned model is also tested on the same task. This paper studies the problem of learning a sequence of tasks (sentiment classification tasks in our case). After each sentiment classification task is learned, its knowledge is retained to help future task learning. Following this setting, we explore attention neural networks and propose a Bayes-enhanced Lifelong Attention Network (BLAN). The key idea is to exploit the generative parameters of naive Bayes to learn attention knowledge. The learned knowledge from each task is stored in a knowledge base and later used to build lifelong attentions. The constructed lifelong attentions are then used to enhance the attention of the network to help new task learning. Experimental results on product reviews from Amazon.com show the effectiveness of the proposed model.

Argumentation Mining on Essays at Multi Scales
Hao Wang | Zhen Huang | Yong Dou | Yu Hong
Proceedings of the 28th International Conference on Computational Linguistics

Argumentation mining on essays is a new challenging task in natural language processing, which aims to identify the types and locations of argumentation components. Recent research mainly models the task as a sequence tagging problem and deal with all the argumentation components at word level. However, this task is not scale-independent. Some types of argumentation components which serve as core opinions on essays or paragraphs, are at essay level or paragraph level. Sequence tagging method conducts reasoning by local context words, and fails to effectively mine these components. To this end, we propose a multi-scale argumentation mining model, where we respectively mine different types of argumentation components at corresponding levels. Besides, an effective coarse-to-fine argumentation fusion mechanism is proposed to further improve the performance. We conduct a serial of experiments on the Persuasive Essay dataset (PE2.0). Experimental results indicate that our model outperforms existing models on mining all types of argumentation components.

Towards Persona-Based Empathetic Conversational Models
Peixiang Zhong | Chen Zhang | Hao Wang | Yong Liu | Chunyan Miao
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Empathetic conversational models have been shown to improve user satisfaction and task outcomes in numerous domains. In Psychology, persona has been shown to be highly correlated to personality, which in turn influences empathy. In addition, our empirical analysis also suggests that persona plays an important role in empathetic conversations. To this end, we propose a new task towards persona-based empathetic conversations and present the first empirical study on the impact of persona on empathetic responding. Specifically, we first present a novel large-scale multi-domain dataset for persona-based empathetic conversations. We then propose CoBERT, an efficient BERT-based response selection model that obtains the state-of-the-art performance on our dataset. Finally, we conduct extensive experiments to investigate the impact of persona on empathetic responding. Notably, our results show that persona improves empathetic responding more when CoBERT is trained on empathetic conversations than non-empathetic ones, establishing an empirical link between persona and empathy in human conversations.

Entity-Aware Dependency-Based Deep Graph Attention Network for Comparative Preference Classification
Nianzu Ma | Sahisnu Mazumder | Hao Wang | Bing Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper studies the task of comparative preference classification (CPC). Given two entities in a sentence, our goal is to classify whether the first (or the second) entity is preferred over the other or no comparison is expressed at all between the two entities. Existing works either do not learn entity-aware representations well and fail to deal with sentences involving multiple entity pairs or use sequential modeling approaches that are unable to capture long-range dependencies between the entities. Some also use traditional machine learning approaches that do not generalize well. This paper proposes a novel Entity-aware Dependency-based Deep Graph Attention Network (ED-GAT) that employs a multi-hop graph attention over a dependency graph sentence representation to leverage both the semantic information from word embeddings and the syntactic information from the dependency graph to solve the problem. Empirical evaluation shows that the proposed model achieves the state-of-the-art performance in comparative preference classification.


Learning with Noisy Labels for Sentence-level Sentiment Classification
Hao Wang | Bing Liu | Chaozhuo Li | Yan Yang | Tianrui Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Deep neural networks (DNNs) can fit (or even over-fit) the training data very well. If a DNN model is trained using data with noisy labels and tested on data with clean labels, the model may perform poorly. This paper studies the problem of learning with noisy labels for sentence-level sentiment classification. We propose a novel DNN model called NetAb (as shorthand for convolutional neural Networks with Ab-networks) to handle noisy labels during training. NetAb consists of two convolutional neural networks, one with a noise transition layer for dealing with the input noisy labels and the other for predicting ‘clean’ labels. We train the two networks using their respective loss functions in a mutual reinforcement manner. Experimental results demonstrate the effectiveness of the proposed model.


A Neural Question Answering Model Based on Semi-Structured Tables
Hao Wang | Xiaodong Zhang | Shuming Ma | Xu Sun | Houfeng Wang | Mengxiang Wang
Proceedings of the 27th International Conference on Computational Linguistics

Most question answering (QA) systems are based on raw text and structured knowledge graph. However, raw text corpora are hard for QA system to understand, and structured knowledge graph needs intensive manual work, while it is relatively easy to obtain semi-structured tables from many sources directly, or build them automatically. In this paper, we build an end-to-end system to answer multiple choice questions with semi-structured tables as its knowledge. Our system answers queries by two steps. First, it finds the most similar tables. Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer. The system is evaluated with TabMCQ dataset, and gets a huge improvement compared to the state of the art.


Using Argument-based Features to Predict and Analyse Review Helpfulness
Haijing Liu | Yang Gao | Pin Lv | Mengxue Li | Shiqiang Geng | Minglan Li | Hao Wang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We study the helpful product reviews identification problem in this paper. We observe that the evidence-conclusion discourse relations, also known as arguments, often appear in product reviews, and we hypothesise that some argument-based features, e.g. the percentage of argumentative sentences, the evidences-conclusions ratios, are good indicators of helpful reviews. To validate this hypothesis, we manually annotate arguments in 110 hotel reviews, and investigate the effectiveness of several combinations of argument-based features. Experiments suggest that, when being used together with the argument-based features, the state-of-the-art baseline features can enjoy a performance boost (in terms of F1) of 11.01% in average.

Unsupervised Bilingual Segmentation using MDL for Machine Translation
Bin Shan | Hao Wang | Yves Lepage
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

BTG-based Machine Translation with Simple Reordering Model using Structured Perceptron
Hao Wang | Yves Lepage
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

A Transition-based System for Universal Dependency Parsing
Hao Wang | Hai Zhao | Zhisong Zhang
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the system for our participation in the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In this work, we design a system based on UDPipe1 for universal dependency parsing, where multilingual transition-based models are trained for different treebanks. Our system directly takes raw texts as input, performing several intermediate steps like tokenizing and tagging, and finally generates the corresponding dependency trees. For the special surprise languages for this task, we adopt a delexicalized strategy and predict basing on transfer learning from other related languages. In the final evaluation of the shared task, our system achieves a result of 66.53% in macro-averaged LAS F1-score.


pdf bib
Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments
Hao Wang | Yves Lepage
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical translation table output by fast align, we propose to realign words using the hierarchical sub-sentential alignment approach. Experimental results show that simple additional processing improves the performance of word alignment, which is measured by counting alignment matches in comparison with fast align. We also report the result of final machine translation in both English-Japanese and Japanese-English. We show our best system provided significant improvements over the baseline as measured by BLEU and RIBES.

HSSA tree structures for BTG-based preordering in machine translation
Yujia Zhang | Hao Wang | Yves Lepage
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

Yet Another Symmetrical and Real-time Word Alignment Method: Hierarchical Sub-sentential Alignment using F-measure
Hao Wang | Yves Lepage
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers


結合ANN、全域變異數與真實軌跡挑選之基週軌跡產生方法(A Pitch-contour Generation Method Combining ANN Prediction,Global Variance Matching, and Real-contour Selection)[In Chinese]
Hung-Yan Gu | Kai-Wei Jiang | Hao Wang
Proceedings of the 27th Conference on Computational Linguistics and Speech Processing (ROCLING 2015)

Translation of Unseen Bigrams by Analogy Using an SVM Classifier
Hao Wang | Lu Lyu | Yves Lepage
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation


A Sentiment-aligned Topic Model for Product Aspect Rating Prediction
Hao Wang | Martin Ester
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


A Dataset for Research on Short-Text Conversations
Hao Wang | Zhengdong Lu | Hang Li | Enhong Chen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
Hao Wang | Dogan Can | Abe Kazemzadeh | François Bar | Shrikanth Narayanan
Proceedings of the ACL 2012 System Demonstrations