Weiping Wang


2022

pdf
TAKE: Topic-shift Aware Knowledge sElection for Dialogue Generation
Chenxu Yang | Zheng Lin | Jiangnan Li | Fandong Meng | Weiping Wang | Lanrui Wang | Jie Zhou
Proceedings of the 29th International Conference on Computational Linguistics

Knowledge-grounded dialogue generation consists of two subtasks: knowledge selection and response generation. The knowledge selector generally constructs a query based on the dialogue context and selects the most appropriate knowledge to help response generation. Recent work finds that realizing who (the user or the agent) holds the initiative and utilizing the role-initiative information to instruct the query construction can help select knowledge. It depends on whether the knowledge connection between two adjacent rounds is smooth to assign the role. However, whereby the user takes the initiative only when there is a strong semantic transition between two rounds, probably leading to initiative misjudgment. Therefore, it is necessary to seek a more sensitive reason beyond the initiative role for knowledge selection. To address the above problem, we propose a Topic-shift Aware Knowledge sElector(TAKE). Specifically, we first annotate the topic shift and topic inheritance labels in multi-round dialogues with distant supervision. Then, we alleviate the noise problem in pseudo labels through curriculum learning and knowledge distillation. Extensive experiments on WoW show that TAKE performs better than strong baselines.

pdf
Target Really Matters: Target-aware Contrastive Learning and Consistency Regularization for Few-shot Stance Detection
Rui Liu | Zheng Lin | Huishan Ji | Jiangnan Li | Peng Fu | Weiping Wang
Proceedings of the 29th International Conference on Computational Linguistics

Stance detection aims to identify the attitude from an opinion towards a certain target. Despite the significant progress on this task, it is extremely time-consuming and budget-unfriendly to collect sufficient high-quality labeled data for every new target under fully-supervised learning, whereas unlabeled data can be collected easier. Therefore, this paper is devoted to few-shot stance detection and investigating how to achieve satisfactory results in semi-supervised settings. As a target-oriented task, the core idea of semi-supervised few-shot stance detection is to make better use of target-relevant information from labeled and unlabeled data. Therefore, we develop a novel target-aware semi-supervised framework. Specifically, we propose a target-aware contrastive learning objective to learn more distinguishable representations for different targets. Such an objective can be easily applied with or without unlabeled data. Furthermore, to thoroughly exploit the unlabeled data and facilitate the model to learn target-relevant stance features in the opinion content, we explore a simple but effective target-aware consistency regularization combined with a self-training strategy. The experimental results demonstrate that our approach can achieve state-of-the-art performance on multiple benchmark datasets in the few-shot setting.

pdf
Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training
Yuanxin Liu | Fandong Meng | Zheng Lin | Peng Fu | Yanan Cao | Weiping Wang | Jie Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.

2021

pdf
Enhancing Zero-shot and Few-shot Stance Detection with Commonsense Knowledge Graph
Rui Liu | Zheng Lin | Yutong Tan | Weiping Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Past, Present, and Future: Conversational Emotion Recognition through Structural Modeling of Psychological Knowledge
Jiangnan Li | Zheng Lin | Peng Fu | Weiping Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Conversational Emotion Recognition (CER) is a task to predict the emotion of an utterance in the context of a conversation. Although modeling the conversational context and interactions between speakers has been studied broadly, it is important to consider the speaker’s psychological state, which controls the action and intention of the speaker. The state-of-the-art method introduces CommonSense Knowledge (CSK) to model psychological states in a sequential way (forwards and backwards). However, it ignores the structural psychological interactions between utterances. In this paper, we propose a pSychological-Knowledge-Aware Interaction Graph (SKAIG). In the locally connected graph, the targeted utterance will be enhanced with the information of action inferred from the past context and intention implied by the future context. The utterance is self-connected to consider the present effect from itself. Furthermore, we utilize CSK to enrich edges with knowledge representations and process the SKAIG with a graph transformer. Our method achieves state-of-the-art and competitive performance on four popular CER datasets.

pdf
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation
Yuanxin Liu | Fandong Meng | Zheng Lin | Weiping Wang | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, knowledge distillation (KD) has shown great success in BERT compression. Instead of only learning from the teacher’s soft label as in conventional KD, researchers find that the rich information contained in the hidden layers of BERT is conducive to the student’s performance. To better exploit the hidden knowledge, a common practice is to force the student to deeply mimic the teacher’s hidden states of all the tokens in a layer-wise manner. In this paper, however, we observe that although distilling the teacher’s hidden state knowledge (HSK) is helpful, the performance gain (marginal utility) diminishes quickly as more HSK is distilled. To understand this effect, we conduct a series of analysis. Specifically, we divide the HSK of BERT into three dimensions, namely depth, length and width. We first investigate a variety of strategies to extract crucial knowledge for each single dimension and then jointly compress the three dimensions. In this way, we show that 1) the student’s performance can be improved by extracting and distilling the crucial HSK, and 2) using a tiny fraction of HSK can achieve the same performance as extensive HSK distillation. Based on the second finding, we further propose an efficient KD paradigm to compress BERT, which does not require loading the teacher during the training of student. For two kinds of student models and computing devices, the proposed KD paradigm gives rise to training speedup of 2.7x 3.4x.

pdf
Check It Again:Progressive Visual Question Answering via Visual Entailment
Qingyi Si | Zheng Lin | Ming yu Zheng | Peng Fu | Weiping Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While sophisticated neural-based models have achieved remarkable success in Visual Question Answering (VQA), these models tend to answer questions only according to superficial correlations between question and answer. Several recent approaches have been developed to address this language priors problem. However, most of them predict the correct answer according to one best output without checking the authenticity of answers. Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers. In this paper, we propose a select-and-rerank (SAR) progressive framework based on Visual Entailment. Specifically, we first select the candidate answers relevant to the question or the image, then we rerank the candidate answers by a visual entailment task, which verifies whether the image semantically entails the synthetic statement of the question and each candidate answer. Experimental results show the effectiveness of our proposed framework, which establishes a new state-of-the-art accuracy on VQA-CP v2 with a 7.55% improvement.

2020

pdf
Modeling Intra and Inter-modality Incongruity for Multi-Modal Sarcasm Detection
Hongliang Pan | Zheng Lin | Peng Fu | Yatao Qi | Weiping Wang
Findings of the Association for Computational Linguistics: EMNLP 2020

Sarcasm is a pervasive phenomenon in today’s social media platforms such as Twitter and Reddit. These platforms allow users to create multi-modal messages, including texts, images, and videos. Existing multi-modal sarcasm detection methods either simply concatenate the features from multi modalities or fuse the multi modalities information in a designed manner. However, they ignore the incongruity character in sarcastic utterance, which is often manifested between modalities or within modalities. Inspired by this, we propose a BERT architecture-based model, which concentrates on both intra and inter-modality incongruity for multi-modal sarcasm detection. To be specific, we are inspired by the idea of self-attention mechanism and design inter-modality attention to capturing inter-modality incongruity. In addition, the co-attention mechanism is applied to model the contradiction within the text. The incongruity information is then used for prediction. The experimental results demonstrate that our model achieves state-of-the-art performance on a public multi-modal sarcasm detection dataset.

2019

pdf
Ranking and Sampling in Open-Domain Question Answering
Yanfu Xu | Zheng Lin | Yuanxin Liu | Rui Liu | Weiping Wang | Dan Meng
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Open-domain question answering (OpenQA) aims to answer questions based on a number of unlabeled paragraphs. Existing approaches always follow the distantly supervised setup where some of the paragraphs are wrong-labeled (noisy), and mainly utilize the paragraph-question relevance to denoise. However, the paragraph-paragraph relevance, which may aggregate the evidence among relevant paragraphs, can also be utilized to discover more useful paragraphs. Moreover, current approaches mainly focus on the positive paragraphs which are known to contain the answer during training. This will affect the generalization ability of the model and make it be disturbed by the similar but irrelevant (distracting) paragraphs during testing. In this paper, we first introduce a ranking model leveraging the paragraph-question and the paragraph-paragraph relevance to compute a confidence score for each paragraph. Furthermore, based on the scores, we design a modified weighted sampling strategy for training to mitigate the influence of the noisy and distracting paragraphs. Experiments on three public datasets (Quasar-T, SearchQA and TriviaQA) show that our model advances the state of the art.