Xiangnan He


Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
Moxin Li | Fuli Feng | Hanwang Zhang | Xiangnan He | Fengbin Zhu | Tat-Seng Chua
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural discrete reasoning (NDR) has shown remarkable progress in combining deep models with discrete reasoning. However, we find that existing NDR solution suffers from large performance drop on hypothetical questions, e.g. “what the annualized rate of return would be if the revenue in 2020 was doubled”. The key to hypothetical question answering (HQA) is counterfactual thinking, which is a natural ability of human reasoning but difficult for deep models. In this work, we devise a Learning to Imagine (L2I) module, which can be seamlessly incorporated into NDR models to perform the imagination of unseen counterfactual. In particular, we formulate counterfactual thinking into two steps: 1) identifying the fact to intervene, and 2) deriving the counterfactual from the fact and assumption, which are designed as neural networks. Based on TAT-QA, we construct a very challenging HQA dataset with 8,283 hypothetical questions. We apply the proposed L2I to TAGOP, the state-of-the-art solution on TAT-QA, validating the rationality and effectiveness of our approach.

DebiasGAN: Eliminating Position Bias in News Recommendation with Adversarial Learning
Chuhan Wu | Fangzhao Wu | Xiangnan He | Yongfeng Huang
Findings of the Association for Computational Linguistics: EMNLP 2022

Click behaviors are widely used for learning news recommendation models, but they are heavily affected by the biases brought by the news display positions. It is important to remove position biases to train unbiased recommendation model and capture unbiased user interest. In this paper, we propose a news recommendation method named DebiasGAN that can effectively alleviate position biases via adversarial learning. The core idea is modeling the personalized effect of position bias on click behaviors in a candidate-aware way, and learning debiased candidate-aware user embeddings from which the position information cannot be discriminated. More specifically, we use a bias-aware click model to capture the effect of position bias on click behaviors, and use a bias-invariant click model with random candidate positions to estimate the ideally unbiased click scores. We apply adversarial learning to the embeddings learned by the two models to help the bias-invariant click model capture debiased user interest. Experimental results on two real-world datasets show that DebiasGAN effectively improves news recommendation by eliminating position biases.

Alibaba-Translate China’s Submission for WMT 2022 Quality Estimation Shared Task
Keqin Bao | Yu Wan | Dayiheng Liu | Baosong Yang | Wenqiang Lei | Xiangnan He | Derek F. Wong | Jun Xie
Proceedings of the Seventh Conference on Machine Translation (WMT)

In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase. Notably, to reduce the gap between pre-training and fine-tuning, we use data cropping and a ranking-based score normalization strategy. For the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years’ WMT competitions. Finally, we collect the source-only evaluation results, and ensemble the predictions generated by two UniTE models, whose backbones are XLM-R and~{textsc{infoXLM}, respectively. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings, showing relatively strong performances in this year’s quality estimation competition.


Empowering Language Understanding with Counterfactual Reasoning
Fuli Feng | Jizhi Zhang | Xiangnan He | Hanwang Zhang | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
Graph-based Aspect Representation Learning for Entity Resolution
Zhenqi Zhao | Yuchen Guo | Dingxian Wang | Yufan Huang | Xiangnan He | Bin Gu
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

Entity Resolution (ER) identifies records that refer to the same real-world entity. Deep learning approaches improved the generalization ability of entity matching models, but hardly overcame the impact of noisy or incomplete data sources. In real scenes, an entity usually consists of multiple semantic facets, called aspects. In this paper, we focus on entity augmentation, namely retrieving the values of missing aspects. The relationship between aspects is naturally suitable to be represented by a knowledge graph, where entity augmentation can be modeled as a link prediction problem. Our paper proposes a novel graph-based approach to solve entity augmentation. Specifically, we apply a dedicated random walk algorithm, which uses node types to limit the traversal length, and encodes graph structure into low-dimensional embeddings. Thus, the missing aspects could be retrieved by a link prediction model. Furthermore, the augmented aspects with fixed orders are served as the input of a deep Siamese BiLSTM network for entity matching. We compared our method with state-of-the-art methods through extensive experiments on downstream ER tasks. According to the experiment results, our model outperforms other methods on evaluation metrics (accuracy, precision, recall, and f1-score) to a large extent, which demonstrates the effectiveness of our method.


Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures
Wenqiang Lei | Xisen Jin | Min-Yen Kan | Zhaochun Ren | Xiangnan He | Dawei Yin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing solutions to task-oriented dialogue systems follow pipeline designs which introduces architectural complexity and fragility. We propose a novel, holistic, extendable framework based on a single sequence-to-sequence (seq2seq) model which can be optimized with supervised or reinforcement learning. A key contribution is that we design text spans named belief spans to track dialogue believes, allowing task-oriented dialogue systems to be modeled in a seq2seq way. Based on this, we propose a simplistic Two Stage CopyNet instantiation which emonstrates good scalability: significantly reducing model complexity in terms of number of parameters and training time by a magnitude. It significantly outperforms state-of-the-art pipeline-based methods on large datasets and retains a satisfactory entity match rate on out-of-vocabulary (OOV) cases where pipeline-designed competitors totally fail.

Batch IS NOT Heavy: Learning Word Representations From All Samples
Xin Xin | Fajie Yuan | Xiangnan He | Joemon M. Jose
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the one-sample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remains at the same level as SGD, being determined by the number of positive samples rather than all samples. We evaluate AllVec on several benchmark tasks. Experiments show that AllVec outperforms sampling-based SGD methods with comparable efficiency, especially for small training corpora.


Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Yiping Jin | Min-Yen Kan | Jun-Ping Ng | Xiangnan He
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing