Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.
Providing a reliable explanation for clinical diagnosis based on the Electronic Medical Record (EMR) is fundamental to the application of Artificial Intelligence in the medical field. Current methods mostly treat the EMR as a text sequence and provide explanations based on a precise medical knowledge base, which is disease-specific and difficult to obtain for experts in reality. Therefore, we propose a counterfactual multi-granularity graph supporting facts extraction (CMGE) method to extract supporting facts from irregular EMR itself without external knowledge bases in this paper. Specifically, we first structure the sequence of EMR into a hierarchical graph network and then obtain the causal relationship between multi-granularity features and diagnosis results through counterfactual intervention on the graph. Features having the strongest causal connection with the results provide interpretive support for the diagnosis. Experimental results on real Chinese EMR of the lymphedema demonstrate that our method can diagnose four types of EMR correctly, and can provide accurate supporting facts for the results. More importantly, the results on different diseases demonstrate the robustness of our approach, which represents the potential application in the medical field.
Knowledge selection plays an important role in knowledge-grounded dialogue, which is a challenging task to generate more informative responses by leveraging external knowledge. Recently, latent variable models have been proposed to deal with the diversity of knowledge selection by using both prior and posterior distributions over knowledge and achieve promising performance. However, these models suffer from a huge gap between prior and posterior knowledge selection. Firstly, the prior selection module may not learn to select knowledge properly because of lacking the necessary posterior information. Secondly, latent variable models suffer from the exposure bias that dialogue generation is based on the knowledge selected from the posterior distribution at training but from the prior distribution at inference. Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection. Experimental results on two knowledge-grounded dialogue datasets show that both PIPM and KDBTS achieve performance improvement over the state-of-the-art latent variable model and their combination shows further improvement.
Emotion recognition in textual conversations (ERTC) plays an important role in a wide range of applications, such as opinion mining, recommender systems, and so on. ERTC, however, is a challenging task. For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance. In this paper, we propose a novel Knowledge Aware Incremental Transformer with Multi-task Learning (KAITML) to address these challenges. Firstly, we devise a dual-level graph attention mechanism to leverage commonsense knowledge, which augments the semantic information of the utterance. Then we apply the Incremental Transformer to encode multi-turn contextual utterances. Moreover, we are the first to introduce multi-task learning to alleviate the aforementioned confusion and thus further improve the emotion recognition performance. Extensive experimental results show that our KAITML model outperforms the state-of-the-art models across five benchmark datasets.
Recently, to incorporate external Knowledge Base (KB) information, one form of world knowledge, several end-to-end task-oriented dialog systems have been proposed. These models, however, tend to confound the dialog history with KB tuples and simply store them into one memory. Inspired by the psychological studies on working memory, we propose a working memory model (WMM2Seq) for dialog response generation. Our WMM2Seq adopts a working memory to interact with two separated long-term memories, which are the episodic memory for memorizing dialog history and the semantic memory for storing KB tuples. The working memory consists of a central executive to attend to the aforementioned memories, and a short-term storage system to store the “activated” contents from the long-term memories. Furthermore, we introduce a context-sensitive perceptual process for the token representations of dialog history, and then feed them into the episodic memory. Extensive experiments on two task-oriented dialog datasets demonstrate that our WMM2Seq significantly outperforms the state-of-the-art results in several evaluation metrics.
Visual Dialog is a multi-modal task that requires a model to participate in a multi-turn human dialog grounded on an image, and generate correct, human-like responses. In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for effective and robust auxiliary training of visual dialog systems. AMFE can force the language-encoding part of a model to generate hidden states in a distribution closely related to the distribution of real-world images, resulting in language features containing general knowledge from both modalities by nature, which can help generate both more correct and more general responses with reasonably low time cost. Experimental results show that AMFE can steadily bring performance gains to different models on different scales of data. Our method outperforms both the supervised learning baselines and other fine-tuning methods, achieving state-of-the-art results on most metrics of VisDial v0.5/v0.9 generative tasks.
Visual reasoning is a special visual question answering problem that is multi-step and compositional by nature, and also requires intensive text-vision interactions. We propose CMM: Cascaded Mutual Modulation as a novel end-to-end visual reasoning model. CMM includes a multi-step comprehension process for both question and image. In each step, we use a Feature-wise Linear Modulation (FiLM) technique to enable textual/visual pipeline to mutually control each other. Experiments show that CMM significantly outperforms most related models, and reach state-of-the-arts on two visual reasoning benchmarks: CLEVR and NLVR, collected from both synthetic and natural languages. Ablation studies confirm the effectiveness of CMM to comprehend natural language logics under the guidence of images. Our code is available at https://github.com/FlamingHorizon/CMM-VR.
Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.
Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data. The models proposed for unsupervised NMT often use only one shared encoder to map the pairs of sentences from different languages to a shared-latent space, which is weak in keeping the unique and internal characteristics of each language, such as the style, terminology, and sentence structure. To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences. Besides, two different generative adversarial networks (GANs), namely the local GAN and global GAN, are proposed to enhance the cross-language translation. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks.
Metaphors are frequently used to convey emotions. However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions. Furthermore, most studies focus on English, and few in other languages, particularly Sino-Tibetan languages such as Chinese, for emotion analysis from metaphorical texts, although there are likely to be many differences in emotional expressions of metaphorical usages across different languages. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese. We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. The annotation agreement analyses for multiple annotators are described. We also use the corpus to explore and analyze the emotionality of metaphors. To the best of our knowledge, this is the first relatively large metaphor corpus with an annotation of emotions in Chinese.
While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.
This paper proposes an approach for applying GANs to NMT. We build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generator and a discriminator. The generator aims to generate sentences which are hard to be discriminated from human-translated sentences ( i.e., the golden target sentences); And the discriminator makes efforts to discriminate the machine-generated sentences from human-translated ones. The two sub models play a mini-max game and achieve the win-win situation when they reach a Nash Equilibrium. Additionally, the static sentence-level BLEU is utilized as the reinforced objective for the generator, which biases the generation towards high BLEU points. During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator. Experimental results show that the proposed model consistently outperforms the traditional RNNSearch and the newly emerged state-of-the-art Transformer on English-German and Chinese-English translation tasks.
Neural Machine Translation (NMT) lays intensive burden on computation and memory cost. It is a challenge to deploy NMT models on the devices with limited computation and memory budgets. This paper presents a four stage pipeline to compress model and speed up the decoding for NMT. Our method first introduces a compact architecture based on convolutional encoder and weight shared embeddings. Then weight pruning is applied to obtain a sparse model. Next, we propose a fast sequence interpolation approach which enables the greedy decoding to achieve performance on par with the beam search. Hence, the time-consuming beam search can be replaced by simple greedy decoding. Finally, vocabulary selection is used to reduce the computation of softmax layer. Our final model achieves 10 times speedup, 17 times parameters reduction, less than 35MB storage size and comparable performance compared to the baseline model.
Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance — 95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves state-of-the-art performance on both datasets — 96.5% on PKU and 98.0% on MSR, without using any external labeled resource.
Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem.. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What’s more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.
Question answering is always an attractive and challenging task in natural language processing area. There are some open domain question answering systems, such as IBM Waston, which take the unstructured text data as input, in some ways of humanlike thinking process and a mode of artificial intelligence. At the conference on Natural Language Processing and Chinese Computing (NLPCC) 2016, China Computer Federation hosted a shared task evaluation about Open Domain Question Answering. We achieve the 2nd place at the document-based subtask. In this paper, we present our solution, which consists of feature engineering in lexical and semantic aspects and model training methods. As the result of the evaluation shows, our solution provides a valuable and brief model which could be used in modelling question answering or sentence semantic relevance. We hope our solution would contribute to this vast and significant task with some heuristic thinking.
Recently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory. However, memory networks conduct the reasoning on sentence-level memory to output coarse semantic vectors and do not further take any attention mechanism to focus on words, which may lead to the model lose some detail information, especially when the answers are rare or unknown words. In this paper, we propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the past facts into sentence-level memory and word-level memory respectively. Then, k-max pooling is exploited following reasoning module on the sentence-level memory to sample the k most relevant sentences to a question and feed these sentences into attention mechanism on the word-level memory to focus the words in the selected sentences. Finally, the prediction is jointly learned over the outputs of the sentence-level reasoning module and the word-level attention mechanism. The experimental results demonstrate that our approach successfully conducts answer selection on unknown words and achieves a better performance than memory networks.
Recently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory. However, memory networks conduct the reasoning on sentence-level memory to output coarse semantic vectors and do not further take any attention mechanism to focus on words, which may lead to the model lose some detail information, especially when the answers are rare or unknown words. In this paper, we propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the past facts into sentence-level memory and word-level memory respectively. Then, k-max pooling is exploited following reasoning module on the sentence-level memory to sample the k most relevant sentences to a question and feed these sentences into attention mechanism on the word-level memory to focus the words in the selected sentences. Finally, the prediction is jointly learned over the outputs of the sentence-level reasoning module and the word-level attention mechanism. The experimental results demonstrate that our approach successfully conducts answer selection on unknown words and achieves a better performance than memory networks.
This article proposes a novel character-aware neural machine translation (NMT) model that views the input sequences as sequences of characters rather than words. On the use of row convolution (Amodei et al., 2015), the encoder of the proposed model composes word-level information from the input sequences of characters automatically. Since our model doesn’t rely on the boundaries between each word (as the whitespace boundaries in English), it is also applied to languages without explicit word segmentations (like Chinese). Experimental results on Chinese-English translation tasks show that the proposed character-aware NMT model can achieve comparable translation performance with the traditional word based NMT models. Despite the target side is still word based, the proposed model is able to generate much less unknown words.
Recurrent Neural Network (RNN) is one of the most popular architectures used in Natural Language Processsing (NLP) tasks because its recurrent structure is very suitable to process variable-length text. RNN can utilize distributed representations of words by first converting the tokens comprising each text into vectors, which form a matrix. And this matrix includes two dimensions: the time-step dimension and the feature vector dimension. Then most existing models usually utilize one-dimensional (1D) max pooling operation or attention-based operation only on the time-step dimension to obtain a fixed-length vector. However, the features on the feature vector dimension are not mutually independent, and simply applying 1D pooling operation over the time-step dimension independently may destroy the structure of the feature representation. On the other hand, applying two-dimensional (2D) pooling operation over the two dimensions may sample more meaningful features for sequence modeling tasks. To integrate the features on both dimensions of the matrix, this paper explores applying 2D max pooling operation to obtain a fixed-length representation of the text. This paper also utilizes 2D convolution to sample more meaningful information of the matrix. Experiments are conducted on six text classification tasks, including sentiment analysis, question classification, subjectivity classification and newsgroup classification. Compared with the state-of-the-art models, the proposed models achieve excellent performance on 4 out of 6 tasks. Specifically, one of the proposed models achieves highest accuracy on Stanford Sentiment Treebank binary classification and fine-grained classification tasks.
In this paper, we describe the CASIA statistical machine translation (SMT) system for the IWSLT2013 Evaluation Campaign. We participated in the Chinese-English and English-Chinese translation tasks. For both of these tasks, we used a hierarchical phrase-based (HPB) decoder and made it as our baseline translation system. A number of techniques were proposed to deal with these translation tasks, including parallel sentence extraction, pre-processing, translation model (TM) optimization, language model (LM) interpolation, turning, and post-processing. With these techniques, the translation results were significantly improved compared with that of the baseline system.
MANOS (Multilingual Application Network for Olympic Services) project. aims to provide intelligent multilingual information services in 2008 Olympic Games. By narrowing down the general language technology, this paper gives an overview of our new work on Phrase-Based Statistical Machine Translation (PBT) under the framework of the MANOS. Starting with the construction of large scale Chinese-English corpus (sentence aligned) and introduction four methods to extract phrases, The promising results from PBT systems lead us to confidences for constructing a high-quality translation system and harmoniously integrate it into MANOS platform.