Xiaodong He


2022

pdf
E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service
Meihuizi Jia | Ruixue Liu | Peiying Wang | Yang Song | Zexi Xi | Haobin Li | Xin Shen | Meng Chen | Jinhui Pang | Xiaodong He
Proceedings of the Thirteenth Language Resources and Evaluation Conference

There has been a growing interest in developing conversational recommendation system (CRS), which provides valuable recommendations to users through conversations. Compared to the traditional recommendation, it advocates wealthier interactions and provides possibilities to obtain users’ exact preferences explicitly. Nevertheless, the corresponding research on this topic is limited due to the lack of broad-coverage dialogue corpus, especially real-world dialogue corpus. To handle this issue and facilitate our exploration, we construct E-ConvRec, an authentic Chinese dialogue dataset consisting of over 25k dialogues and 770k utterances, which contains user profile, product knowledge base (KB), and multiple sequential real conversations between users and recommenders. Next, we explore conversational recommendation in a real scene from multiple facets based on the dataset. Therefore, we particularly design three tasks: user preference recognition, dialogue management, and personalized recommendation. In the light of the three tasks, we establish baseline results on E-ConvRec to facilitate future studies.

pdf
Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT
Jing Zhao | Yifan Wang | Junwei Bao | Youzheng Wu | Xiaodong He
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transformer-based pre-trained models, such as BERT, have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, deploying these models can be prohibitively costly, as the standard self-attention mechanism of the Transformer suffers from quadratic computational cost in the input sequence length. To confront this, we propose FCA, a fine- and coarse-granularity hybrid self-attention that reduces the computation cost through progressively shortening the computational sequence length in self-attention. Specifically, FCA conducts an attention-based scoring strategy to determine the informativeness of tokens at each layer. Then, the informative tokens serve as the fine-granularity computing units in self-attention and the uninformative tokens are replaced with one or several clusters as the coarse-granularity computing units in self-attention. Experiments on the standard GLUE benchmark show that BERT with FCA achieves 2x reduction in FLOPs over original BERT with <1% loss in accuracy. We show that FCA offers a significantly better trade-off between accuracy and FLOPs compared to prior methods.

pdf
BORT: Back and Denoising Reconstruction for End-to-End Task-Oriented Dialog
Haipeng Sun | Junwei Bao | Youzheng Wu | Xiaodong He
Findings of the Association for Computational Linguistics: NAACL 2022

A typical end-to-end task-oriented dialog system transfers context into dialog state, and upon which generates a response, which usually faces the problem of error propagation from both previously generated inaccurate dialog states and responses, especially in low-resource scenarios. To alleviate these issues, we propose BORT, a back and denoising reconstruction approach for end-to-end task-oriented dialog system. Squarely, to improve the accuracy of dialog states, back reconstruction is used to reconstruct the original input context from the generated dialog states since inaccurate dialog states cannot recover the corresponding input context. To enhance the denoising capability of the model to reduce the impact of error propagation, denoising reconstruction is used to reconstruct the corrupted dialog state and response. Extensive experiments conducted on MultiWOZ 2.0 and CamRest676 show the effectiveness of BORT. Furthermore, BORT demonstrates its advanced capabilities in the zero-shot domain and low-resource scenarios.

pdf
Tracking Satisfaction States for Customer Satisfaction Prediction in E-commerce Service Chatbots
Yang Sun | Liangqing Wu | Shuangyong Song | Xiaoguang Yu | Xiaodong He | Guohong Fu
Proceedings of the 29th International Conference on Computational Linguistics

Due to the increasing use of service chatbots in E-commerce platforms in recent years, customer satisfaction prediction (CSP) is gaining more and more attention. CSP is dedicated to evaluating subjective customer satisfaction in conversational service and thus helps improve customer service experience. However, previous methods focus on modeling customer-chatbot interaction across different turns, which are hard to represent the important dynamic satisfaction states throughout the customer journey. In this work, we investigate the problem of satisfaction states tracking and its effects on CSP in E-commerce service chatbots. To this end, we propose a dialogue-level classification model named DialogueCSP to track satisfaction states for CSP. In particular, we explore a novel two-step interaction module to represent the dynamic satisfaction states at each turn. In order to capture dialogue-level satisfaction states for CSP, we further introduce dialogue-aware attentions to integrate historical informative cues into the interaction module. To evaluate the proposed approach, we also build a Chinese E-commerce dataset for CSP. Experiment results demonstrate that our model significantly outperforms multiple baselines, illustrating the benefits of satisfaction states tracking on CSP.

pdf
Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline
Ruixue Liu | Shaozu Yuan | Aijun Dai | Lei Shen | Tiangang Zhu | Meng Chen | Xiaodong He
Proceedings of the 29th International Conference on Computational Linguistics

Few-shot table understanding is a critical and challenging problem in real-world scenario as annotations over large amount of tables are usually costly. Pre-trained language models (PLMs), which have recently flourished on tabular data, have demonstrated their effectiveness for table understanding tasks. However, few-shot table understanding is rarely explored due to the deficiency of public table pre-training corpus and well-defined downstream benchmark tasks, especially in Chinese. In this paper, we establish a benchmark dataset, FewTUD, which consists of 5 different tasks with human annotations to systematically explore the few-shot table understanding in depth. Since there is no large number of public Chinese tables, we also collect a large-scale, multi-domain tabular corpus to facilitate future Chinese table pre-training, which includes one million tables and related natural language text with auxiliary supervised interaction signals. Finally, we present FewTPT, a novel table PLM with rich interactions over tabular data, and evaluate its performance comprehensively on the benchmark. Our dataset and model will be released to the public soon.

pdf
Label Anchored Contrastive Learning for Language Understanding
Zhenyu Zhang | Yuming Zhao | Meng Chen | Xiaodong He
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Contrastive learning (CL) has achieved astonishing progress in computer vision, speech, and natural language processing fields recently with self-supervised learning. However, CL approach to the supervised setting is not fully explored, especially for the natural language understanding classification task. Intuitively, the class label itself has the intrinsic ability to perform hard positive/negative mining, which is crucial for CL. Motivated by this, we propose a novel label anchored contrastive learning approach (denoted as LaCon) for language understanding. Specifically, three contrastive objectives are devised, including a multi-head instance-centered contrastive loss (ICL), a label-centered contrastive loss (LCL), and a label embedding regularizer (LER). Our approach does not require any specialized network architecture or any extra data augmentation, thus it can be easily plugged into existing powerful pre-trained language models. Compared to the state-of-the-art baselines, LaCon obtains up to 4.1% improvement on the popular datasets of GLUE and CLUE benchmarks. Besides, LaCon also demonstrates significant advantages under the few-shot and data imbalance settings, which obtains up to 9.4% improvement on the FewGLUE and FewCLUE benchmarking tasks.

pdf
OPERA: Operation-Pivoted Discrete Reasoning over Text
Yongwei Zhou | Junwei Bao | Chaoqun Duan | Haipeng Sun | Jiahui Liang | Yifan Wang | Jing Zhao | Youzheng Wu | Xiaodong He | Tiejun Zhao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Machine reading comprehension (MRC) that requires discrete reasoning involving symbolic operations, e.g., addition, sorting, and counting, is a challenging task. According to this nature, semantic parsing-based methods predict interpretable but complex logical forms. However, logical form generation is nontrivial and even a little perturbation in a logical form will lead to wrong answers. To alleviate this issue, multi-predictor -based methods are proposed to directly predict different types of answers and achieve improvements. However, they ignore the utilization of symbolic operations and encounter a lack of reasoning ability and interpretability. To inherit the advantages of these two types of methods, we propose OPERA, an operation-pivoted discrete reasoning framework, where lightweight symbolic operations (compared with logical forms) as neural modules are utilized to facilitate the reasoning ability and interpretability. Specifically, operations are first selected and then softly executed to simulate the answer reasoning procedure. Extensive experiments on both DROP and RACENum datasets show the reasoning ability of OPERA. Moreover, further analysis verifies its interpretability.

pdf
Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation
Guangyi Liu | Zichao Yang | Tianhua Tao | Xiaodan Liang | Junwei Bao | Zhen Li | Xiaodong He | Shuguang Cui | Zhiting Hu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy (CE) loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence is not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address the challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL is designed to be robust to various noises and edits in the target sequences. Moreover, the EISL computation is essentially an approximate convolution operation with target n-grams as kernels, which is easy to implement and efficient to compute with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on a wide range of tasks, including machine translation with noisy target sequences, unsupervised text style transfer with only weak training signals, and non-autoregressive generation with non-predefined generation order. Experimental results show our method significantly outperforms the common CE loss and other strong baselines on all the tasks. EISL has a simple API that can be used as a drop-in replacement of the CE loss: https://github.com/guangyliu/EISL.

pdf
LUNA: Learning Slot-Turn Alignment for Dialogue State Tracking
Yifan Wang | Jing Zhao | Junwei Bao | Chaoqun Duan | Youzheng Wu | Xiaodong He
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Dialogue state tracking (DST) aims to predict the current dialogue state given the dialogue history. Existing methods generally exploit the utterances of all dialogue turns to assign value for each slot. This could lead to suboptimal results due to the information introduced from irrelevant utterances in the dialogue history, which may be useless and can even cause confusion. To address this problem, we propose LUNA, a SLot-TUrN Alignment enhanced approach. It first explicitly aligns each slot with its most relevant utterance, then further predicts the corresponding value based on this aligned utterance instead of all dialogue utterances. Furthermore, we design a slot ranking auxiliary task to learn the temporal correlation among slots which could facilitate the alignment. Comprehensive experiments are conducted on three multi-domain task-oriented dialogue datasets, MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2. The results show that LUNA achieves new state-of-the-art results on these datasets.

2021

pdf
RevCore: Review-Augmented Conversational Recommendation
Yu Lu | Junwei Bao | Yan Song | Zichen Ma | Shuguang Cui | Youzheng Wu | Xiaodong He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce
Song Xu | Haoran Li | Peng Yuan | Yujia Wang | Youzheng Wu | Xiaodong He | Ying Liu | Bowen Zhou
Findings of the Association for Computational Linguistics: EMNLP 2021

Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, which is essential for downstream tasks in many domains, such as tasks in e-commerce scenarios. In this paper, we propose K-PLUG, a knowledge-injected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks. Specifically, we propose five knowledge-aware self-supervised pre-training objectives to formulate the learning of domain-specific knowledge, including e-commerce domain-specific knowledge-bases, aspects of product entities, categories of product entities, and unique selling propositions of product entities. We verify our method in a diverse range of e-commerce scenarios that require domain-specific knowledge, including product knowledge base completion, abstractive product summarization, and multi-turn dialogue. K-PLUG significantly outperforms baselines across the board, which demonstrates that the proposed method effectively learns a diverse set of domain-specific knowledge for both language understanding and generation tasks. Our code is available.

pdf
RoR: Read-over-Read for Long Document Machine Reading Comprehension
Jing Zhao | Junwei Bao | Yifan Wang | Yongwei Zhou | Youzheng Wu | Xiaodong He | Bowen Zhou
Findings of the Association for Computational Linguistics: EMNLP 2021

Transformer-based pre-trained models, such as BERT, have achieved remarkable results on machine reading comprehension. However, due to the constraint of encoding length (e.g., 512 WordPiece tokens), a long document is usually split into multiple chunks that are independently read. It results in the reading field being limited to individual chunks without information collaboration for long document machine reading comprehension. To address this problem, we propose RoR, a read-over-read method, which expands the reading field from chunk to document. Specifically, RoR includes a chunk reader and a document reader. The former first predicts a set of regional answers for each chunk, which are then compacted into a highly-condensed version of the original document, guaranteeing to be encoded once. The latter further predicts the global answers from this condensed document. Eventually, a voting strategy is utilized to aggregate and rerank the regional and global answers for final prediction. Extensive experiments on two benchmarks QuAC and TriviaQA demonstrate the effectiveness of RoR for long document reading. Notably, RoR ranks 1st place on the QuAC leaderboard (https://quac.ai/) at the time of submission (May 17th, 2021).

pdf
Graph Ensemble Learning over Multiple Dependency Trees for Aspect-level Sentiment Classification
Xiaochen Hou | Peng Qi | Guangtao Wang | Rex Ying | Jing Huang | Xiaodong He | Bowen Zhou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent work on aspect-level sentiment classification has demonstrated the efficacy of incorporating syntactic structures such as dependency trees with graph neural networks (GNN), but these approaches are usually vulnerable to parsing errors. To better leverage syntactic information in the face of unavoidable errors, we propose a simple yet effective graph ensemble technique, GraphMerge, to make use of the predictions from different parsers. Instead of assigning one set of model parameters to each dependency tree, we first combine the dependency relations from different parses before applying GNNs over the resulting graph. This allows GNN models to be robust to parse errors at no additional computational cost, and helps avoid overparameterization and overfitting from GNN layer stacking by introducing more connectivity into the ensemble graph. Our experiments on the SemEval 2014 Task 4 and ACL 14 Twitter datasets show that our GraphMerge model not only outperforms models with single dependency tree, but also beats other ensemble models without adding model parameters.

pdf
SGG: Learning to Select, Guide, and Generate for Keyphrase Generation
Jing Zhao | Junwei Bao | Yifan Wang | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Keyphrases, that concisely summarize the high-level topics discussed in a document, can be categorized into present keyphrase which explicitly appears in the source text and absent keyphrase which does not match any contiguous subsequence but is highly semantically related to the source. Most existing keyphrase generation approaches synchronously generate present and absent keyphrases without explicitly distinguishing these two categories. In this paper, a Select-Guide-Generate (SGG) approach is proposed to deal with present and absent keyphrases generation separately with different mechanisms. Specifically, SGG is a hierarchical neural network which consists of a pointing-based selector at low layer concentrated on present keyphrase generation, a selection-guided generator at high layer dedicated to absent keyphrase generation, and a guider in the middle to transfer information from selector to generator. Experimental results on four keyphrase generation benchmarks demonstrate the effectiveness of our model, which significantly outperforms the strong baselines for both present and absent keyphrases generation. Furthermore, we extend SGG to a title generation task which indicates its extensibility in natural language generation tasks.

pdf
Selective Attention Based Graph Convolutional Networks for Aspect-Level Sentiment Classification
Xiaochen Hou | Jing Huang | Guangtao Wang | Peng Qi | Xiaodong He | Bowen Zhou
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)

Recent work on aspect-level sentiment classification has employed Graph Convolutional Networks (GCN) over dependency trees to learn interactions between aspect terms and opinion words. In some cases, the corresponding opinion words for an aspect term cannot be reached within two hops on dependency trees, which requires more GCN layers to model. However, GCNs often achieve the best performance with two layers, and deeper GCNs do not bring any additional gain. Therefore, we design a novel selective attention based GCN model. On one hand, the proposed model enables the direct interaction between aspect terms and context words via the self-attention operation without the distance limitation on dependency trees. On the other hand, a top-k selection procedure is designed to locate opinion words by selecting k context words with the highest attention scores. We conduct experiments on several commonly used benchmark datasets and the results show that our proposed SA-GCN outperforms strong baseline models.

pdf
Learn to Copy from the Copying History: Correlational Copy Network for Abstractive Summarization
Haoran Li | Song Xu | Peng Yuan | Yujia Wang | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The copying mechanism has had considerable success in abstractive summarization, facilitating models to directly copy words from the input text to the output summary. Existing works mostly employ encoder-decoder attention, which applies copying at each time step independently of the former ones. However, this may sometimes lead to incomplete copying. In this paper, we propose a novel copying scheme named Correlational Copying Network (CoCoNet) that enhances the standard copying mechanism by keeping track of the copying history. It thereby takes advantage of prior copying distributions and, at each time step, explicitly encourages the model to copy the input word that is relevant to the previously copied one. In addition, we strengthen CoCoNet through pre-training with suitable corpora that simulate the copying behaviors. Experimental results show that CoCoNet can copy more accurately and achieves new state-of-the-art performances on summarization benchmarks, including CNN/DailyMail for news summarization and SAMSum for dialogue summarization. The code and checkpoint will be publicly available.

2020

pdf
Self-Attention Guided Copy Mechanism for Abstractive Summarization
Song Xu | Haoran Li | Peng Yuan | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Copy module has been widely equipped in the recent abstractive summarization models, which facilitates the decoder to extract words from the source into the summary. Generally, the encoder-decoder attention is served as the copy distribution, while how to guarantee that important words in the source are copied remains a challenge. In this work, we propose a Transformer-based model to enhance the copy mechanism. Specifically, we identify the importance of each source word based on the degree centrality with a directed graph built by the self-attention layer in the Transformer. We use the centrality of each source word to guide the copy process explicitly. Experimental results show that the self-attention graph provides useful guidance for the copy distribution. Our proposed models significantly outperform the baseline methods on the CNN/Daily Mail dataset and the Gigaword dataset.

pdf
Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding
Yun Tang | Jing Huang | Guangtao Wang | Xiaodong He | Bowen Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Distance-based knowledge graph embeddings have shown substantial improvement on the knowledge graph link prediction task, from TransE to the latest state-of-the-art RotatE. However, complex relations such as N-to-1, 1-to-N and N-to-N still remain challenging to predict. In this work, we propose a novel distance-based approach for knowledge graph link prediction. First, we extend the RotatE from 2D complex domain to high dimensional space with orthogonal transforms to model relations. The orthogonal transform embedding for relations keeps the capability for modeling symmetric/anti-symmetric, inverse and compositional relations while achieves better modeling capacity. Second, the graph context is integrated into distance scoring functions directly. Specifically, graph context is explicitly modeled via two directed context representations. Each node embedding in knowledge graph is augmented with two context representations, which are computed from the neighboring outgoing and incoming nodes/edges respectively. The proposed approach improves prediction accuracy on the difficult N-to-1, 1-to-N and N-to-N cases. Our experimental results show that it achieves state-of-the-art results on two common benchmarks FB15k-237 and WNRR-18, especially on FB15k-237 which has many high in-degree nodes.

pdf
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service
Meng Chen | Ruixue Liu | Lei Shen | Shaozu Yuan | Jingyan Zhou | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the Twelfth Language Resources and Evaluation Conference

Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamental research in dialogue task.

pdf
Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product
Tiangang Zhu | Yue Wang | Haoran Li | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we propose a multimodal method to jointly predict product attributes and extract values from textual product descriptions with the help of the product images. We argue that product attributes and values are highly correlated, e.g., it will be easier to extract the values on condition that the product attributes are given. Thus, we jointly model the attribute prediction and value extraction tasks from multiple aspects towards the interactions between attributes and values. Moreover, product images have distinct effects on our tasks for different product attributes and values. Thus, we selectively draw useful visual information from product images to enhance our model. We annotate a multimodal product attribute value dataset that contains 87,194 instances, and the experimental results on this dataset demonstrate that explicitly modeling the relationship between attributes and values facilitates our method to establish the correspondence between them, and selectively utilizing visual product information is necessary for the task. Our code and dataset are available at https://github.com/jd-aig/JAVE.

pdf
Multimodal Sentence Summarization via Multimodal Selective Encoding
Haoran Li | Junnan Zhu | Jiajun Zhang | Xiaodong He | Chengqing Zong
Proceedings of the 28th International Conference on Computational Linguistics

This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.

pdf
On the Faithfulness for E-commerce Product Summarization
Peng Yuan | Haoran Li | Song Xu | Youzheng Wu | Xiaodong He | Bowen Zhou
Proceedings of the 28th International Conference on Computational Linguistics

In this work, we present a model to generate e-commerce product summaries. The consistency between the generated summary and the product attributes is an essential criterion for the ecommerce product summarization task. To enhance the consistency, first, we encode the product attribute table to guide the process of summary generation. Second, we identify the attribute words from the vocabulary, and we constrain these attribute words can be presented in the summaries only through copying from the source, i.e., the attribute words not in the source cannot be generated. We construct a Chinese e-commerce product summarization dataset, and the experimental results on this dataset demonstrate that our models significantly improve the faithfulness.

pdf
Learning to Decouple Relations: Few-Shot Relation Classification with Entity-Guided Attention and Confusion-Aware Training
Yingyao Wang | Junwei Bao | Guangyi Liu | Youzheng Wu | Xiaodong He | Bowen Zhou | Tiejun Zhao
Proceedings of the 28th International Conference on Computational Linguistics

This paper aims to enhance the few-shot relation classification especially for sentences that jointly describe multiple relations. Due to the fact that some relations usually keep high co-occurrence in the same context, previous few-shot relation classifiers struggle to distinguish them with few annotated instances. To alleviate the above relation confusion problem, we propose CTEG, a model equipped with two novel mechanisms to learn to decouple these easily-confused relations. On the one hand, an Entity -Guided Attention (EGA) mechanism, which leverages the syntactic relations and relative positions between each word and the specified entity pair, is introduced to guide the attention to filter out information causing confusion. On the other hand, a Confusion-Aware Training (CAT) method is proposed to explicitly learn to distinguish relations by playing a pushing-away game between classifying a sentence into a true relation and its confusing relation. Extensive experiments are conducted on the FewRel dataset, and the results show that our proposed model achieves comparable and even much better results to strong baselines in terms of accuracy. Furthermore, the ablation test and case study verify the effectiveness of our proposed EGA and CAT, especially in addressing the relation confusion problem.

pdf
Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking
Ruosong Yang | Jiannong Cao | Zhiyuan Wen | Youzheng Wu | Xiaodong He
Findings of the Association for Computational Linguistics: EMNLP 2020

Automated Essay Scoring (AES) is a critical text regression task that automatically assigns scores to essays based on their writing quality. Recently, the performance of sentence prediction tasks has been largely improved by using Pre-trained Language Models via fusing representations from different layers, constructing an auxiliary sentence, using multi-task learning, etc. However, to solve the AES task, previous works utilize shallow neural networks to learn essay representations and constrain calculated scores with regression loss or ranking loss, respectively. Since shallow neural networks trained on limited samples show poor performance to capture deep semantic of texts. And without an accurate scoring function, ranking loss and regression loss measures two different aspects of the calculated scores. To improve AES’s performance, we find a new way to fine-tune pre-trained language models with multiple losses of the same task. In this paper, we propose to utilize a pre-trained language model to learn text representations first. With scores calculated from the representations, mean square error loss and the batch-wise ListNet loss with dynamic weights constrain the scores simultaneously. We utilize Quadratic Weighted Kappa to evaluate our model on the Automated Student Assessment Prize dataset. Our model outperforms not only state-of-the-art neural models near 3 percent but also the latest statistic model. Especially on the two narrative prompts, our model performs much better than all other state-of-the-art models.

2019

pdf
Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs
Ming Tu | Guangtao Wang | Jing Huang | Yun Tang | Xiaodong He | Bowen Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multi-hop reading comprehension (RC) across documents poses new challenge over single-document RC because it requires reasoning over multiple documents to reach the final answer. In this paper, we propose a new model to tackle the multi-hop RC problem. We introduce a heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph. The advantage of HDE graph is that it contains different granularity levels of information including candidates, documents and entities in specific document contexts. Our proposed model can do reasoning over the HDE graph with nodes representation initialized with co-attention and self-attention based context encoders. We employ Graph Neural Networks (GNN) based message passing algorithms to accumulate evidences on the proposed HDE graph. Evaluated on the blind test set of the Qangaroo WikiHop data set, our HDE graph based single model delivers competitive result, and the ensemble model achieves the state-of-the-art performance.

pdf
Relation Module for Non-Answerable Predictions on Reading Comprehension
Kevin Huang | Yun Tang | Jing Huang | Xiaodong He | Bowen Zhou
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Machine reading comprehension (MRC) has attracted significant amounts of research attention recently, due to an increase of challenging reading comprehension datasets. In this paper, we aim to improve a MRC model’s ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task). The relation module consists of both semantic extraction and relational information. We first extract high level semantics as objects from both question and context with multi-head self-attentive pooling. These semantic objects are then passed to a relation network, which generates relationship scores for each object pair in a sentence. These scores are used to determine whether a question is non-answerable. We test the relation module on the SQuAD 2.0 dataset using both the BiDAF and BERT models as baseline readers. We obtain 1.8% gain of F1 accuracy on top of the BiDAF reader, and 1.0% on top of the BERT base model. These results show the effectiveness of our relation module on MRC.

2018

pdf
Discourse-Aware Neural Rewards for Coherent Text Generation
Antoine Bosselut | Asli Celikyilmaz | Xiaodong He | Jianfeng Gao | Po-Sen Huang | Yejin Choi
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this paper, we investigate the use of discourse-aware rewards with reinforcement learning to guide a model to generate long, coherent text. In particular, we propose to learn neural rewards to model cross-sentence ordering as a means to approximate desired discourse structure. Empirical results demonstrate that a generator trained with the learned reward produces more coherent and less repetitive text than models trained with cross-entropy or with reinforcement learning with commonly used scores as rewards.

pdf
Tensor Product Generation Networks for Deep NLP Modeling
Qiuyuan Huang | Paul Smolensky | Xiaodong He | Li Deng | Dapeng Wu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a new approach to the design of deep networks for natural language processing (NLP), based on the general technique of Tensor Product Representations (TPRs) for encoding and processing symbol structures in distributed neural networks. A network architecture — the Tensor Product Generation Network (TPGN) — is proposed which is capable in principle of carrying out TPR computation, but which uses unconstrained deep learning to design its internal representations. Instantiated in a model for image-caption generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset. The TPR-capable structure enables interpretation of internal representations and operations, which prove to contain considerable grammatical content. Our caption-generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation.

pdf
Deep Communicating Agents for Abstractive Summarization
Asli Celikyilmaz | Antoine Bosselut | Xiaodong He | Yejin Choi
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present deep communicating agents in an encoder-decoder architecture to address the challenges of representing a long document for abstractive summarization. With deep communicating agents, the task of encoding a long text is divided across multiple collaborating agents, each in charge of a subsection of the input text. These encoders are connected to a single decoder, trained end-to-end using reinforcement learning to generate a focused and coherent summary. Empirical results demonstrate that multiple communicating encoders lead to a higher quality summary compared to several strong baselines, including those based on a single encoder or multiple non-communicating encoders.

pdf
Natural Language to Structured Query Generation via Meta-Learning
Po-Sen Huang | Chenglong Wang | Rishabh Singh | Wen-tau Yih | Xiaodong He
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In conventional supervised training, a model is trained to fit all the training examples. However, having a monolithic model may not always be the best strategy, as examples could vary widely. In this work, we explore a different learning protocol that treats each example as a unique pseudo-task, by reducing the original learning problem to a few-shot meta-learning scenario with the help of a domain-dependent relevance function. When evaluated on the WikiSQL dataset, our approach leads to faster convergence and achieves 1.1%–5.4% absolute accuracy gains over the non-meta-learning counterparts.

pdf
Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations
Dipendra Misra | Ming-Wei Chang | Xiaodong He | Wen-tau Yih
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Semantic parsing from denotations faces two key challenges in model training: (1) given only the denotations (e.g., answers), search for good candidate semantic parses, and (2) choose the best model update algorithm. We propose effective and general solutions to each of them. Using policy shaping, we bias the search procedure towards semantic parses that are more compatible to the text, which provide better supervision signals for training. In addition, we propose an update equation that generalizes three different families of learning algorithms, which enables fast model exploration. When experimented on a recently proposed sequential question answering dataset, our framework leads to a new state-of-the-art model that outperforms previous work by 5.0% absolute on exact match accuracy.

pdf
Deep Reinforcement Learning for NLP
William Yang Wang | Jiwei Li | Xiaodong He
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Many Natural Language Processing (NLP) tasks (including generation, language grounding, reasoning, information extraction, coreference resolution, and dialog) can be formulated as deep reinforcement learning (DRL) problems. However, since language is often discrete and the space for all sentences is infinite, there are many challenges for formulating reinforcement learning problems of NLP tasks. In this tutorial, we provide a gentle introduction to the foundation of deep reinforcement learning, as well as some practical DRL solutions in NLP. We describe recent advances in designing deep reinforcement learning for NLP, with a special focus on generation, dialogue, and information extraction. Finally, we discuss why they succeed, and when they may fail, aiming at providing some practical advice about deep reinforcement learning for solving real-world NLP problems.

2017

pdf
Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
David Golub | Po-Sen Huang | Xiaodong He | Li Deng
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We develop a technique for transfer learning in machine comprehension (MC) using a novel two-stage synthesis network. Given a high performing MC model in one domain, our technique aims to answer questions about documents in another domain, where we use no labeled data of question-answer pairs. Using the proposed synthesis network with a pretrained model on the SQuAD dataset, we achieve an F1 measure of 46.6% on the challenging NewsQA dataset, approaching performance of in-domain models (F1 measure of 50.0%) and outperforming the out-of-domain baseline by 7.6%, without use of provided annotations.

pdf
Learning Generic Sentence Representations Using Convolutional Neural Networks
Zhe Gan | Yunchen Pu | Ricardo Henao | Chunyuan Li | Xiaodong He | Lawrence Carin
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.

2016

pdf
Deep Reinforcement Learning with a Natural Language Action Space
Ji He | Jianshu Chen | Xiaodong He | Jianfeng Gao | Lihong Li | Li Deng | Mari Ostendorf
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Generating Natural Questions About an Image
Nasrin Mostafazadeh | Ishan Misra | Jacob Devlin | Margaret Mitchell | Xiaodong He | Lucy Vanderwende
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
Nasrin Mostafazadeh | Nathanael Chambers | Xiaodong He | Devi Parikh | Dhruv Batra | Lucy Vanderwende | Pushmeet Kohli | James Allen
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Visual Storytelling
Ting-Hao Kenneth Huang | Francis Ferraro | Nasrin Mostafazadeh | Ishan Misra | Aishwarya Agrawal | Jacob Devlin | Ross Girshick | Xiaodong He | Pushmeet Kohli | Dhruv Batra | C. Lawrence Zitnick | Devi Parikh | Lucy Vanderwende | Michel Galley | Margaret Mitchell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Hierarchical Attention Networks for Document Classification
Zichao Yang | Diyi Yang | Chris Dyer | Xiaodong He | Alex Smola | Eduard Hovy
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Character-Level Question Answering with Attention
Xiaodong He | David Golub
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads
Ji He | Mari Ostendorf | Xiaodong He | Jianshu Chen | Jianfeng Gao | Lihong Li | Li Deng
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
Bi-directional Attention with Agreement for Dependency Parsing
Hao Cheng | Hao Fang | Xiaodong He | Jianfeng Gao | Li Deng
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf
Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval
Xiaodong Liu | Jianfeng Gao | Xiaodong He | Li Deng | Kevin Duh | Ye-yi Wang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Deep Learning and Continuous Representations for Natural Language Processing
Wen-tau Yih | Xiaodong He | Jianfeng Gao
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf
Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base
Wen-tau Yih | Ming-Wei Chang | Xiaodong He | Jianfeng Gao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin | Hao Cheng | Hao Fang | Saurabh Gupta | Li Deng | Xiaodong He | Geoffrey Zweig | Margaret Mitchell
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Data Selection With Fewer Words
Amittai Axelrod | Philip Resnik | Xiaodong He | Mari Ostendorf
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf
Learning Continuous Phrase Representations for Translation Modeling
Jianfeng Gao | Xiaodong He | Wen-tau Yih | Li Deng
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Semantic Parsing for Single-Relation Question Answering
Wen-tau Yih | Xiaodong He | Christopher Meek
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Modeling Interestingness with Deep Neural Networks
Jianfeng Gao | Patrick Pantel | Michael Gamon | Xiaodong He | Li Deng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf
MSR-FBK IWSLT 2013 SLT system description
Anthony Aue | Qin Gao | Hany Hassan | Xiaodong He | Gang Li | Nicholas Ruiz | Frank Seide
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the systems used for the MSR+FBK submission for the SLT track of IWSLT 2013. Starting from a baseline system we made a series of iterative and additive improvements, including a novel method for processing bilingual data used to train MT systems for use on ASR output. Our primary submission is a system combination of five individual systems, combining the output of multiple ASR engines with multiple MT techniques. There are two contrastive submissions to help place the combined system in context. We describe the systems used and present results on the test sets.

pdf
Training MRF-Based Phrase Translation Models using Gradient Ascent
Jianfeng Gao | Xiaodong He
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf
Learning Lexicon Models from Search Logs for Query Expansion
Jianfeng Gao | Shasha Xie | Xiaodong He | Alnur Ali
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding
Antti-Veikko Rosti | Xiaodong He | Damianos Karakos | Gregor Leusch | Yuan Cao | Markus Freitag | Spyros Matsoukas | Hermann Ney | Jason Smith | Bing Zhang
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf
Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Xiaodong He | Li Deng
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf
The MSR system for IWSLT 2011 evaluation
Xiaodong He | Amittai Axelrod | Li Deng | Alex Acero | Mei-Yuh Hwang | Alisa Nguyen | Andrew Wang | Xiahui Huang
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the Microsoft Research (MSR) system for the evaluation campaign of the 2011 international workshop on spoken language translation. The evaluation task is to translate TED talks (www.ted.com). This task presents two unique challenges: First, the underlying topic switches sharply from talk to talk. Therefore, the translation system needs to adapt to the current topic quickly and dynamically. Second, only a very small amount of relevant parallel data (transcripts of TED talks) is available. Therefore, it is necessary to perform accurate translation model estimation with limited data. In the preparation for the evaluation, we developed two new methods to attack these problems. Specifically, we developed an unsupervised topic modeling based adaption method for machine translation models. We also developed a discriminative training method to estimate parameters in the generative components of the translation models with limited data. Experimental results show that both methods improve the translation quality. Among all the submissions, ours achieves the best BLEU score in the machine translation Chinese-to-English track (MT_CE) of the IWSLT 2011 evaluation that we participated.

pdf
Domain Adaptation via Pseudo In-Domain Data Selection
Amittai Axelrod | Xiaodong He | Jianfeng Gao
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2009

pdf
Joint Optimization for Machine Translation System Combination
Xiaodong He | Kristina Toutanova
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Using N-gram based Features for Machine Translation System Combination
Yong Zhao | Xiaodong He
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
Incremental HMM Alignment for MT System Combination
Chi-Ho Li | Xiaodong He | Yupeng Liu | Ning Xi
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf
Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems
Xiaodong He | Mei Yang | Jianfeng Gao | Patrick Nguyen | Robert Moore
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Training Non-Parametric Features for Statistical Machine Translation
Patrick Nguyen | Milind Mahajan | Xiaodong He
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation
Xiaodong He
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Automatic validation of terminology translation consistenscy with statistical method
Masaki Itagaki | Takako Aikawa | Xiaodong He
Proceedings of Machine Translation Summit XI: Papers

Search
Co-authors