Richong Zhang


2022

pdf
An Unsupervised Multiple-Task and Multiple-Teacher Model for Cross-lingual Named Entity Recognition
Zhuoran Li | Chunming Hu | Xiaohui Guo | Junfan Chen | Wenyi Qin | Richong Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-lingual named entity recognition task is one of the critical problems for evaluating the potential transfer learning techniques on low resource languages. Knowledge distillation using pre-trained multilingual language models between source and target languages have shown their superiority in transfer. However, existing cross-lingual distillation models merely consider the potential transferability between two identical single tasks across both domains. Other possible auxiliary tasks to improve the learning performance have not been fully investigated. In this study, based on the knowledge distillation framework and multi-task learning, we introduce the similarity metric model as an auxiliary task to improve the cross-lingual NER performance on the target domain. Specifically, an entity recognizer and a similarity evaluator are first trained in parallel as two teachers from the source domain. Then, two tasks in the student model are supervised by these teachers simultaneously. Empirical studies on the three datasets across 7 different languages confirm the effectiveness of the proposed model.

pdf
A Transformational Biencoder with In-Domain Negative Sampling for Zero-Shot Entity Linking
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Findings of the Association for Computational Linguistics: ACL 2022

Recent interest in entity linking has focused in the zero-shot scenario, where at test time the entity mention to be labelled is never seen during training, or may belong to a different domain from the source domain. Current work leverage pre-trained BERT with the implicit assumption that it bridges the gap between the source and target domain distributions. However, fine-tuned BERT has a considerable underperformance at zero-shot when applied in a different domain. We solve this problem by proposing a Transformational Biencoder that incorporates a transformation into BERT to perform a zero-shot transfer from the source domain during training. As like previous work, we rely on negative entities to encourage our model to discriminate the golden entities during training. To generate these negative entities, we propose a simple but effective strategy that takes the domain of the golden entity into perspective. Our experimental results on the benchmark dataset Zeshel show effectiveness of our approach and achieve new state-of-the-art.

pdf
E-VarM: Enhanced Variational Word Masks to Improve the Interpretability of Text Classification Models
Ling Ge | ChunMing Hu | Guanghui Ma | Junshuang Wu | Junfan Chen | JiHong Liu | Hong Zhang | Wenyi Qin | Richong Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Enhancing the interpretability of text classification models can help increase the reliability of these models in real-world applications. Currently, most researchers focus on extracting task-specific words from inputs to improve the interpretability of the model. The competitive approaches exploit the Variational Information Bottleneck (VIB) to improve the performance of word masking at the word embedding layer to obtain task-specific words. However, these approaches ignore the multi-level semantics of the text, which can impair the interpretability of the model, and do not consider the risk of representation overlap caused by the VIB, which can impair the classification performance. In this paper, we propose an enhanced variational word masks approach, named E-VarM, to solve these two issues effectively. The E-VarM combines multi-level semantics from all hidden layers of the model to mask out task-irrelevant words and uses contrastive learning to readjust the distances between representations. Empirical studies on ten benchmark text classification datasets demonstrate that our approach outperforms the SOTA methods in simultaneously improving the interpretability and accuracy of the model.

2021

pdf
Hypernym Discovery via a Recurrent Mapping Model
Yuhang Bai | Richong Zhang | Fanshuang Kong | Junfan Chen | Yongyi Mao
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Keyphrase Extraction with Incomplete Annotated Training Data
Yanfei Lei | Chunming Hu | Guanghui Ma | Richong Zhang
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Extracting keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Supervised approaches to keyphrase extraction(KPE) are largely developed based on the assumption that the training data is fully annotated. However, due to the difficulty of keyphrase annotating, KPE models severely suffer from incomplete annotated problem in many scenarios. To this end, we propose a more robust training method that learns to mitigate the misguidance brought by unlabeled keyphrases. We introduce negative sampling to adjust training loss, and conduct experiments under different scenarios. Empirical studies on synthetic datasets and open domain dataset show that our model is robust to incomplete annotated problem and surpasses prior baselines. Extensive experiments on five scientific domain datasets of different scales demonstrate that our model is competitive with the state-of-the-art method.

2020

pdf
Learning VAE-LDA Models with Rounded Reparameterization Trick
Runzhi Tian | Yongyi Mao | Richong Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The introduction of VAE provides an efficient framework for the learning of generative models, including generative topic models. However, when the topic model is a Latent Dirichlet Allocation (LDA) model, a central technique of VAE, the reparameterization trick, fails to be applicable. This is because no reparameterization form of Dirichlet distributions is known to date that allows the use of the reparameterization trick. In this work, we propose a new method, which we call Rounded Reparameterization Trick (RRT), to reparameterize Dirichlet distributions for the learning of VAE-LDA models. This method, when applied to a VAE-LDA model, is shown experimentally to outperform the existing neural topic models on several benchmark datasets and on a synthetic dataset.

pdf
Parallel Interactive Networks for Multi-Domain Dialogue State Generation
Junfan Chen | Richong Zhang | Yongyi Mao | Jie Xu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The dependencies between system and user utterances in the same turn and across different turns are not fully considered in existing multidomain dialogue state tracking (MDST) models. In this study, we argue that the incorporation of these dependencies is crucial for the design of MDST and propose Parallel Interactive Networks (PIN) to model these dependencies. Specifically, we integrate an interactive encoder to jointly model the in-turn dependencies and cross-turn dependencies. The slot-level context is introduced to extract more expressive features for different slots. And a distributed copy mechanism is utilized to selectively copy words from historical system utterances or historical user utterances. Empirical studies demonstrated the superiority of the proposed PIN model.

pdf
Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the task-specific networks for prediction. However, such an approach hinders the model from learning explicit interactions between the two tasks to improve the performance on the individual tasks. As a solution, we design a multi-task learning model which we refer to as recurrent interaction network which allows the learning of interactions dynamically, to effectively model task-specific features for classification. Empirical studies on two real-world datasets confirm the superiority of the proposed model.

pdf
Neural Dialogue State Tracking with Temporally Expressive Networks
Junfan Chen | Richong Zhang | Yongyi Mao | Jie Xu
Findings of the Association for Computational Linguistics: EMNLP 2020

Dialogue state tracking (DST) is an important part of a spoken dialogue system. Existing DST models either ignore temporal feature dependencies across dialogue turns or fail to explicitly model temporal state dependencies in a dialogue. In this work, we propose Temporally Expressive Networks (TEN) to jointly model the two types of temporal dependencies in DST. The TEN model utilizes the power of recurrent networks and probabilistic graphical models. Evaluating on standard datasets, TEN is demonstrated to improve the accuracy of turn-level-state prediction and the state aggregation.

2019

pdf
Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework
Junfan Chen | Richong Zhang | Yongyi Mao | Hongyu Guo | Jie Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Distant supervision for relation extraction enables one to effectively acquire structured relations out of very large text corpora with less human efforts. Nevertheless, most of the prior-art models for such tasks assume that the given text can be noisy, but their corresponding labels are clean. Such unrealistic assumption is contradictory with the fact that the given labels are often noisy as well, thus leading to significant performance degradation of those models on real-world data. To cope with this challenge, we propose a novel label-denoising framework that combines neural network with probabilistic modelling, which naturally takes into account the noisy labels during learning. We empirically demonstrate that our approach significantly improves the current art in uncovering the ground-truth relation labels.

pdf
Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree
Kai Sun | Richong Zhang | Samuel Mensah | Yongyi Mao | Xudong Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a method based on neural networks to identify the sentiment polarity of opinion words expressed on a specific aspect of a sentence. Although a large majority of works typically focus on leveraging the expressive power of neural networks in handling this task, we explore the possibility of integrating dependency trees with neural networks for representation learning. To this end, we present a convolution over a dependency tree (CDT) model which exploits a Bi-directional Long Short Term Memory (Bi-LSTM) to learn representations for features of a sentence, and further enhance the embeddings with a graph convolutional network (GCN) which operates directly on the dependency tree of the sentence. Our approach propagates both contextual and dependency information from opinion words to aspect words, offering discriminative properties for supervision. Experimental results ranks our approach as the new state-of-the-art in aspect-based sentiment classification.

2018

pdf
Syntax Encoding with Application in Authorship Attribution
Richong Zhang | Zhiyuan Hu | Hongyu Guo | Yongyi Mao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a novel strategy to encode the syntax parse tree of sentence into a learnable distributed representation. The proposed syntax encoding scheme is provably information-lossless. In specific, an embedding vector is constructed for each word in the sentence, encoding the path in the syntax tree corresponding to the word. The one-to-one correspondence between these “syntax-embedding” vectors and the words (hence their embedding vectors) in the sentence makes it easy to integrate such a representation with all word-level NLP models. We empirically show the benefits of the syntax embeddings on the Authorship Attribution domain, where our approach improves upon the prior art and achieves new performance records on five benchmarking data sets.

pdf
The APVA-TURBO Approach To Question Answering in Knowledge Base
Yue Wang | Richong Zhang | Cheng Xu | Yongyi Mao
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we study the problem of question answering over knowledge base. We identify that the primary bottleneck in this problem is the difficulty in accurately predicting the relations connecting the subject entity to the object entities. We advocate a new model architecture, APVA, which includes a verification mechanism responsible for checking the correctness of predicted relations. The APVA framework naturally supports a well-principled iterative training procedure, which we call turbo training. We demonstrate via experiments that the APVA-TUBRO approach drastically improves the question answering performance.