Zhoujun Li


2022

pdf
Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation
Xinyu Pi | Bing Wang | Yan Gao | Jiaqi Guo | Zhoujun Li | Jian-Guang Lou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The robustness of Text-to-SQL parsers against adversarial perturbations plays a crucial role in delivering highly reliable applications. Previous studies along this line primarily focused on perturbations in the natural language question side, neglecting the variability of tables. Motivated by this, we propose the Adversarial Table Perturbation (ATP) as a new attacking paradigm to measure robustness of Text-to-SQL models. Following this proposition, we curate ADVETA, the first robustness evaluation benchmark featuring natural and realistic ATPs. All tested state-of-the-art models experience dramatic performance drops on ADVETA, revealing significant room of improvement. To defense against ATP, we build a systematic adversarial training example generation framework tailored for better contextualization of tabular data. Experiments show that our approach brings models best robustness improvement against ATP, while also substantially boost model robustness against NL-side perturbations. We will release ADVETA and code to facilitate future research.

pdf
Towards Modeling Role-Aware Centrality for Dialogue Summarization
Xinnian Liang | Chao Bian | Shuangzhi Wu | Zhoujun Li
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Role-oriented dialogue summarization generates summaries for different roles in dialogue (e.g. doctor and patient). Existing methods consider roles separately where interactions among different roles are not fully explored. In this paper, we propose a novel Role-Aware Centrality (RAC) model to capture role interactions, which can be easily applied to any seq2seq models. The RAC assigns each role a specific sentence-level centrality score by involving role prompts to control what kind of summary to generate. The RAC measures both the importance of utterances and the relevance between roles and utterances. Then we use RAC to re-weight context representations, which are used by the decoder to generate role summaries. We verify RAC on two public benchmark datasets, CSDS and MC. Experimental results show that the proposed method achieves new state-of-the-art results on the two datasets. Extensive analyses have demonstrated that the role-aware centrality helps generate summaries more precisely.

pdf
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization
Ze Yang | Christian Wang | Zhoujin Tian | Wei Wu | Zhoujun Li
Findings of the Association for Computational Linguistics: NAACL 2022

Although pre-trained language models (PLMs) have achieved great success and become a milestone in NLP, abstractive conversational summarization remains a challenging but less studied task. The difficulty lies in two aspects. One is the lack of large-scale conversational summary data. Another is that applying the existing pre-trained models to this task is tricky because of the structural dependence within the conversation and its informal expression, etc. In this work, we first build a large-scale (11M) pretraining dataset called RCSum, based on the multi-person discussions in the Reddit community. We then present TANet, a thread-aware Transformer-based network. Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency among the utterances plays an essential role in understanding the entire conversation and thus propose two new techniques to incorporate the structural information into our model. The first is thread-aware attention which is computed by taking into account the contextual dependency within utterances. Second, we apply thread prediction loss to predict the relations between utterances. We evaluate our model on four datasets of real conversations, covering types of meeting transcripts, customer-service records, and forum threads. Experimental results demonstrate that TANet achieves a new state-of-the-art in terms of both automatic evaluation and human judgment.

pdf
PAEG: Phrase-level Adversarial Example Generation for Neural Machine Translation
Juncheng Wan | Jian Yang | Shuming Ma | Dongdong Zhang | Weinan Zhang | Yong Yu | Zhoujun Li
Proceedings of the 29th International Conference on Computational Linguistics

While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable. Generating adversarial examples as the augmented data has been proved to be useful to alleviate this problem. Existing methods for adversarial example generation (AEG) are word-level or character-level, which ignore the ubiquitous phrase structure. In this paper, we propose a Phrase-level Adversarial Example Generation (PAEG) framework to enhance the robustness of the translation model. Our method further improves the gradient-based word-level AEG method by adopting a phrase-level substitution strategy. We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks. Experimental results demonstrate that our approach significantly improves translation performance and robustness to noise compared to previous strong baselines.

pdf
An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework Based on Semantic Blocks
Xinnian Liang | Jing Li | Shuangzhi Wu | Jiali Zeng | Yufan Jiang | Mu Li | Zhoujun Li
Proceedings of the 29th International Conference on Computational Linguistics

Unsupervised summarization methods have achieved remarkable results by incorporating representations from pre-trained language models. However, existing methods fail to consider efficiency and effectiveness at the same time when the input document is extremely long. To tackle this problem, in this paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The semantic block refers to continuous sentences in the document that describe the same facet. Specifically, we address this problem by converting the one-step ranking method into the hierarchical multi-granularity two-stage ranking. In the coarse-level stage, we proposed a new segment algorithm to split the document into facet-aware semantic blocks and then filter insignificant blocks. In the fine-level stage, we select salient sentences in each block and then extract the final summary from selected sentences. We evaluate our framework on four long document summarization datasets: Gov-Report, BillSum, arXiv, and PubMed. Our C2F-FAR can achieve new state-of-the-art unsupervised summarization results on Gov-Report and BillSum. In addition, our method speeds up 4-28 times more than previous methods.

pdf
Modeling Multi-Granularity Hierarchical Features for Relation Extraction
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Relation extraction is a key task in Natural Language Processing (NLP), which aims to extract relations between entity pairs from given texts. Recently, relation extraction (RE) has achieved remarkable progress with the development of deep neural networks. Most existing research focuses on constructing explicit structured features using external knowledge such as knowledge graph and dependency tree. In this paper, we propose a novel method to extract multi-granularity features based solely on the original input sentences. We show that effective structured features can be attained even without external knowledge. Three kinds of features based on the input sentences are fully exploited, which are in entity mention level, segment level, and sentence level. All the three are jointly and hierarchically modeled. We evaluate our method on three public benchmarks: SemEval 2010 Task 8, Tacred, and Tacred Revisited. To verify the effectiveness, we apply our method to different encoders such as LSTM and BERT. Experimental results show that our method significantly outperforms existing state-of-the-art models that even use external knowledge. Extensive analyses demonstrate that the performance of our model is contributed by the capture of multi-granularity features and the model of their hierarchical structure.

2021

pdf
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Enhancing Dialogue-based Relation Extraction by Speaker and Trigger Words Prediction
Tianyang Zhao | Zhao Yan | Yunbo Cao | Zhoujun Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
TWT: Table with Written Text for Controlled Data-to-Text Generation
Tongliang Li | Lei Fang | Jian-Guang Lou | Zhoujun Li
Findings of the Association for Computational Linguistics: EMNLP 2021

Large pre-trained neural models have recently shown remarkable progress in text generation. In this paper, we propose to generate text conditioned on the structured data (table) and a prefix (the written text) by leveraging the pre-trained models. We present a new data-to-text dataset, Table with Written Text (TWT), by repurposing two existing datasets: ToTTo and TabFact. TWT contains both factual and logical statements that are faithful to the structured data, aiming to serve as a useful benchmark for controlled text generation. Compared with existing data-to-text task settings, TWT is more intuitive, the prefix (usually provided by the user) controls the topic of the generated text. Existing methods usually output hallucinated text that is not faithful on TWT. Therefore, we design a novel approach with table-aware attention visibility and copy mechanism over the table. Experimental results show that our approach outperforms state-of-the-art methods under both automatic and human evaluation metrics.

pdf
Smart-Start Decoding for Neural Machine Translation
Jian Yang | Shuming Ma | Dongdong Zhang | Juncheng Wan | Zhoujun Li | Ming Zhou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left. In this work, we propose a novel method that breaks up the limitation of these decoding orders, called Smart-Start decoding. More specifically, our method first predicts a median word. It starts to decode the words on the right side of the median word and then generates words on the left. We evaluate the proposed Smart-Start decoding method on three datasets. Experimental results show that the proposed method can significantly outperform strong baseline models.

pdf
Matching Distributions between Model and Data: Cross-domain Knowledge Distillation for Unsupervised Domain Adaptation
Bo Zhang | Xiaoming Zhang | Yun Liu | Lei Cheng | Zhoujun Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Unsupervised Domain Adaptation (UDA) aims to transfer the knowledge of source domain to the unlabeled target domain. Existing methods typically require to learn to adapt the target model by exploiting the source data and sharing the network architecture across domains. However, this pipeline makes the source data risky and is inflexible for deploying the target model. This paper tackles a novel setting where only a trained source model is available and different network architectures can be adapted for target domain in terms of deployment environments. We propose a generic framework named Cross-domain Knowledge Distillation (CdKD) without needing any source data. CdKD matches the joint distributions between a trained source model and a set of target data during distilling the knowledge from the source model to the target domain. As a type of important knowledge in the source domain, for the first time, the gradient information is exploited to boost the transfer performance. Experiments on cross-domain text classification demonstrate that CdKD achieves superior performance, which verifies the effectiveness in this novel setting.

pdf
Multilingual Agreement for Multilingual Neural Machine Translation
Jian Yang | Yuwei Yin | Shuming Ma | Haoyang Huang | Dongdong Zhang | Zhoujun Li | Furu Wei
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives. Most multilingual models can not explicitly exploit different language pairs to assist each other, ignoring the relationships among them. In this work, we propose a novel agreement-based method to encourage multilingual agreement among different translation directions, which minimizes the differences among them. We combine the multilingual training objectives with the agreement term by randomly substituting some fragments of the source language with their counterpart translations of auxiliary languages. To examine the effectiveness of our method, we conduct experiments on the multilingual translation task of 10 language pairs. Experimental results show that our method achieves significant improvements over the previous multilingual baselines.

pdf
Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context
Xinnian Liang | Shuangzhi Wu | Mu Li | Zhoujun Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Embedding based methods are widely used for unsupervised keyphrase extraction (UKE) tasks. Generally, these methods simply calculate similarities between phrase embeddings and document embedding, which is insufficient to capture different context for a more effective UKE model. In this paper, we propose a novel method for UKE, where local and global contexts are jointly modeled. From a global view, we calculate the similarity between a certain phrase and the whole document in the vector space as transitional embedding based models do. In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices. Then, we proposed a new centrality computation method to capture local salient information based on the graph structure. Finally, we further combine the modeling of global and local context for ranking. We evaluate our models on three public benchmarks (Inspec, DUC 2001, SemEval 2010) and compare with existing state-of-the-art models. The results show that our model outperforms most models while generalizing better on input documents with different domains and length. Additional ablation study shows that both the local and global information is crucial for unsupervised keyphrase extraction tasks.

pdf
Jointly Learning to Repair Code and Generate Commit Message
Jiaqi Bai | Long Zhou | Ambrosio Blanco | Shujie Liu | Furu Wei | Ming Zhou | Zhoujun Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We propose a novel task of jointly repairing program codes and generating commit messages. Code repair and commit message generation are two essential and related tasks for software development. However, existing work usually performs the two tasks independently. We construct a multilingual triple dataset including buggy code, fixed code, and commit messages for this novel task. We first introduce a cascaded method with two models, one is to generate the fixed code first, and the other generates the commit message based on the fixed and original codes. We enhance the cascaded method with different training approaches, including the teacher-student method, the multi-task method, and the back-translation method. To deal with the error propagation problem of the cascaded method, we also propose a joint model that can both repair the program code and generate the commit message in a unified framework. Massive experiments on our constructed buggy-fixed-commit dataset reflect the challenge of this task and that the enhanced cascaded model and the proposed joint model significantly outperform baselines in both quality of code and commit messages.

2020

pdf
Improving Neural Machine Translation with Soft Template Prediction
Jian Yang | Shuming Ma | Dongdong Zhang | Zhoujun Li | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although neural machine translation (NMT) has achieved significant progress in recent years, most previous NMT models only depend on the source text to generate translation. Inspired by the success of template-based and syntax-based approaches in other fields, we propose to use extracted templates from tree structures as soft target templates to guide the translation procedure. In order to learn the syntactic structure of the target sentences, we adopt constituency-based parse tree to generate candidate templates. We incorporate the template information into the encoder-decoder framework to jointly utilize the templates and source text. Experiments show that our model significantly outperforms the baseline models on four benchmarks and demonstrates the effectiveness of soft target templates.

pdf
TableBank: Table Benchmark for Image-based Table Detection and Recognition
Minghao Li | Lei Cui | Shaohan Huang | Furu Wei | Ming Zhou | Zhoujun Li
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousand human-labeled examples, which is difficult to generalize on real-world applications. With TableBank that contains 417K high quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available and hope it will empower more deep learning approaches in the table detection and recognition task. The dataset and models can be downloaded from https://github.com/doc-analysis/TableBank.

pdf
Entity Relative Position Representation based Multi-head Selection for Joint Entity and Relation Extraction
Tianyang Zhao | Zhao Yan | Yunbo Cao | Zhoujun Li
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Joint entity and relation extraction has received increasing interests recently, due to the capability of utilizing the interactions between both steps. Among existing studies, the Multi-Head Selection (MHS) framework is efficient in extracting entities and relations simultaneously. However, the method is weak for its limited performance. In this paper, we propose several effective insights to address this problem. First, we propose an entity-specific Relative Position Representation (eRPR) to allow the model to fully leverage the distance information between entities and context tokens. Second, we introduce an auxiliary Global Relation Classification (GRC) to enhance the learning of local contextual features. Moreover, we improve the semantic representation by adopting a pre-trained language model BERT as the feature encoder. Finally, these new keypoints are closely integrated with the multi-head selection framework and optimized jointly. Extensive experiments on two benchmark datasets demonstrate that our approach overwhelmingly outperforms previous works in terms of all evaluation metrics, achieving significant improvements for relation F1 by +2.40% on CoNLL04 and +1.90% on ACE05, respectively.

pdf
DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li | Yiheng Xu | Lei Cui | Shaohan Huang | Furu Wei | Zhoujun Li | Ming Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present DocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the LaTeX documents available on the arXiv.com. With DocBank, models from different modalities can be compared fairly and multi-modal approaches will be further investigated and boost the performance of document layout analysis. We build several strong baselines and manually split train/dev/test sets for evaluation. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents. The DocBank dataset is publicly available at https://github.com/doc-analysis/DocBank.

pdf
Formality Style Transfer with Shared Latent Space
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | WenHan Chao
Proceedings of the 28th International Conference on Computational Linguistics

Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training. However, the dataset for formality style transfer is considerably smaller than translation corpora. Moreover, we observe that informal and formal sentences closely resemble each other, which is different from the translation task where two languages have different vocabularies and grammars. In this paper, we present a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where we propose two auxiliary losses and adopt joint training of bi-directional transfer and auto-encoding. Experimental results show that S2S-SLS (with either RNN or Transformer architectures) consistently outperforms baselines in various settings, especially when we have limited data.

pdf
StyleDGPT: Stylized Response Generation with Pre-trained Language Models
Ze Yang | Wei Wu | Can Xu | Xinnian Liang | Jiaqi Bai | Liran Wang | Wei Wang | Zhoujun Li
Findings of the Association for Computational Linguistics: EMNLP 2020

Generating responses following a desired style has great potentials to extend applications of open-domain dialogue systems, yet is refrained by lacking of parallel data for training. In this work, we explore the challenging task with pre-trained language models that have brought breakthrough to various natural language tasks. To this end, we introduce a KL loss and a style classifier to the fine-tuning step in order to steer response generation towards the target style in both a word-level and a sentence-level. Comprehensive empirical studies with two public datasets indicate that our model can significantly outperform state-of-the-art methods in terms of both style consistency and contextual coherence.

2019

pdf
A Sequential Matching Framework for Multi-Turn Response Selection in Retrieval-Based Chatbots
Yu Wu | Wei Wu | Chen Xing | Can Xu | Zhoujun Li | Ming Zhou
Computational Linguistics, Volume 45, Issue 1 - March 2019

We study the problem of response selection for multi-turn conversation in retrieval-based chatbots. The task involves matching a response candidate with a conversation context, the challenges for which include how to recognize important parts of the context, and how to model the relationships among utterances in the context. Existing matching methods may lose important information in contexts as we can interpret them with a unified framework in which contexts are transformed to fixed-length vectors without any interaction with responses before matching. This motivates us to propose a new matching framework that can sufficiently carry important information in contexts to matching and model relationships among utterances at the same time. The new framework, which we call a sequential matching framework (SMF), lets each utterance in a context interact with a response candidate at the first step and transforms the pair to a matching vector. The matching vectors are then accumulated following the order of the utterances in the context with a recurrent neural network (RNN) that models relationships among utterances. Context-response matching is then calculated with the hidden states of the RNN. Under SMF, we propose a sequential convolutional network and sequential attention network and conduct experiments on two public data sets to test their performance. Experiment results show that both models can significantly outperform state-of-the-art matching methods. We also show that the models are interpretable with visualizations that provide us insights on how they capture and leverage important information in contexts for matching.

pdf
Low-Resource Response Generation with Template Prior
Ze Yang | Wei Wu | Jian Yang | Can Xu | Zhoujun Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We study open domain response generation with limited message-response pairs. The problem exists in real-world applications but is less explored by the existing work. Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data. The generation model is defined by an encoder-decoder architecture with templates as prior, where the templates are estimated from the unpaired data as a neural hidden semi-markov model. By this means, response generation learned from the small paired data can be aided by the semantic and syntactic knowledge in the large unpaired data. To balance the effect of the prior and the input message to response generation, we propose learning the whole generation model with an adversarial approach. Empirical studies on question response generation and sentiment response generation indicate that when only a few pairs are available, our model can significantly outperform several state-of-the-art response generation models in terms of both automatic and human evaluation.

pdf
Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL
Haoyan Liu | Lei Fang | Qian Liu | Bei Chen | Jian-Guang Lou | Zhoujun Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

One key component in text-to-SQL is to predict the comparison relations between columns and their values. To the best of our knowledge, no existing models explicitly introduce external common knowledge to address this problem, thus their capabilities of predicting comparison relations are limited beyond training data. In this paper, we propose to leverage adjective-noun phrasing knowledge mined from the web to predict the comparison relations in text-to-SQL. Experimental results on both the original and the re-split Spider dataset show that our approach achieves significant improvement over state-of-the-art methods on comparison relation prediction.

pdf
Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | Wenhan Chao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Formality text style transfer plays an important role in various NLP applications, such as non-native speaker assistants and child education. Early studies normalize informal sentences with rules, before statistical and neural models become a prevailing method in the field. While a rule-based system is still a common preprocessing step for formality style transfer in the neural era, it could introduce noise if we use the rules in a naive way such as data preprocessing. To mitigate this problem, we study how to harness rules into a state-of-the-art neural network that is typically pretrained on massive corpora. We propose three fine-tuning methods in this paper and achieve a new state-of-the-art on benchmark datasets

pdf
Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation
Ze Yang | Can Xu | Wei Wu | Zhoujun Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Automatic news comment generation is beneficial for real applications but has not attracted enough attention from the research community. In this paper, we propose a “read-attend-comment” procedure for news comment generation and formalize the procedure with a reading network and a generation network. The reading network comprehends a news article and distills some important points from it, then the generation network creates a comment by attending to the extracted discrete points and the news title. We optimize the model in an end-to-end manner by maximizing a variational lower bound of the true objective using the back-propagation algorithm. Experimental results on two public datasets indicate that our model can significantly outperform existing methods in terms of both automatic evaluation and human judgment.

2018

pdf
Keyphrase Generation with Correlation Constraints
Jun Chen | Xiaoming Zhang | Yu Wu | Zhao Yan | Zhoujun Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we study automatic keyphrase generation. Although conventional approaches to this task show promising results, they neglect correlation among keyphrases, resulting in duplication and coverage issues. To solve these problems, we propose a new sequence-to-sequence architecture for keyphrase generation named CorrRNN, which captures correlation among multiple keyphrases in two ways. First, we employ a coverage vector to indicate whether the word in the source document has been summarized by previous phrases to improve the coverage for keyphrases. Second, preceding phrases are taken into account to eliminate duplicate phrases and improve result coherence. Experiment results show that our model significantly outperforms the state-of-the-art method on benchmark datasets in terms of both accuracy and diversity.

pdf
Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots
Yu Wu | Wei Wu | Zhoujun Li | Ming Zhou
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots. The method employs a sequence-to-sequence architecture (Seq2Seq) model as a weak annotator to judge the matching degree of unlabeled pairs, and then performs learning with both the weak signals and the unlabeled data. Experimental results on two public data sets indicate that matching models get significant improvements when they are learned with the proposed method.

2017

pdf
Beihang-MSRA at SemEval-2017 Task 3: A Ranking System with Neural Matching Features for Community Question Answering
Wenzheng Feng | Yu Wu | Wei Wu | Zhoujun Li | Ming Zhou
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper presents the system in SemEval-2017 Task 3, Community Question Answering (CQA). We develop a ranking system that is capable of capturing semantic relations between text pairs with little word overlap. In addition to traditional NLP features, we introduce several neural network based matching features which enable our system to measure text similarity beyond lexicons. Our system significantly outperforms baseline methods and holds the second place in Subtask A and the fifth place in Subtask B, which demonstrates its efficacy on answer selection and question retrieval.

pdf
Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots
Yu Wu | Wei Wu | Chen Xing | Ming Zhou | Zhoujun Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study response selection for multi-turn conversation in retrieval based chatbots. Existing work either concatenates utterances in context or matches a response with a highly abstract context vector finally, which may lose relationships among the utterances or important information in the context. We propose a sequential matching network (SMN) to address both problems. SMN first matches a response with each utterance in the context on multiple levels of granularity, and distills important matching information from each pair as a vector with convolution and pooling operations. The vectors are then accumulated in a chronological order through a recurrent neural network (RNN) which models relationships among the utterances. The final matching score is calculated with the hidden states of the RNN. Empirical study on two public data sets shows that SMN can significantly outperform state-of-the-art methods for response selection in multi-turn conversation.

pdf
Jointly Extracting Relations with Class Ties via Effective Deep Ranking
Hai Ye | Wenhan Chao | Zhunchen Luo | Zhoujun Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Connections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance.

pdf
Chinese Answer Extraction Based on POS Tree and Genetic Algorithm
Shuihua Li | Xiaoming Zhang | Zhoujun Li
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing

Answer extraction is the most important part of a chinese web-based question answering system. In order to enhance the robustness and adaptability of answer extraction to new domains and eliminate the influence of the incomplete and noisy search snippets, we propose two new answer exraction methods. We utilize text patterns to generate Part-of-Speech (POS) patterns. In addition, a method is proposed to construct a POS tree by using these POS patterns. The POS tree is useful to candidate answer extraction of web-based question answering. To retrieve a efficient POS tree, the similarities between questions are used to select the question-answer pairs whose questions are similar to the unanswered question. Then, the POS tree is improved based on these question-answer pairs. In order to rank these candidate answers, the weights of the leaf nodes of the POS tree are calculated using a heuristic method. Moreover, the Genetic Algorithm (GA) is used to train the weights. The experimental results of 10-fold crossvalidation show that the weighted POS tree trained by GA can improve the accuracy of answer extraction.

2016

pdf
DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents
Zhao Yan | Nan Duan | Junwei Bao | Peng Chen | Ming Zhou | Zhoujun Li | Jianshe Zhou
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Detecting Context Dependent Messages in a Conversational Environment
Chaozhuo Li | Yu Wu | Wei Wu | Chen Xing | Zhoujun Li | Ming Zhou
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

While automatic response generation for building chatbot systems has drawn a lot of attention recently, there is limited understanding on when we need to consider the linguistic context of an input text in the generation process. The task is challenging, as messages in a conversational environment are short and informal, and evidence that can indicate a message is context dependent is scarce. After a study of social conversation data crawled from the web, we observed that some characteristics estimated from the responses of messages are discriminative for identifying context dependent messages. With the characteristics as weak supervision, we propose using a Long Short Term Memory (LSTM) network to learn a classifier. Our method carries out text representation and classifier learning in a unified framework. Experimental results show that the proposed method can significantly outperform baseline methods on accuracy of classification.

2014

pdf
Exploiting Timelines to Enhance Multi-document Summarization
Jun-Ping Ng | Yan Chen | Min-Yen Kan | Zhoujun Li
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf
A Semi-Supervised Bayesian Network Model for Microblog Topic Classification
Yan Chen | Zhoujun Li | Liqiang Nie | Xia Hu | Xiangyu Wang | Tat-Seng Chua | Xiaoming Zhang
Proceedings of COLING 2012

2011

pdf
A Graph-based Bilingual Corpus Selection Approach for SMT
Wenhan Chao | Zhoujun Li
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf
Comparable Entity Mining from Comparative Questions
Shasha Li | Chin-Yew Lin | Young-In Song | Zhoujun Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics