Pascal Poupart


WatClaimCheck: A new Dataset for Claim Entailment and Inference
Kashif Khan | Ruizhe Wang | Pascal Poupart
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We contribute a new dataset for the task of automated fact checking and an evaluation of state of the art algorithms. The dataset includes claims (from speeches, interviews, social media and news articles), review articles published by professional fact checkers and premise articles used by those professional fact checkers to support their review and verify the veracity of the claims. An important challenge in the use of premise articles is the identification of relevant passages that will help to infer the veracity of a claim. We show that transferring a dense passage retrieval model trained with review articles improves the retrieval quality of passages in premise articles. We report results for the prediction of claim veracity by inference from premise articles.

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md Akmal Haidar | Nithin Anchuri | Mehdi Rezagholizadeh | Abbas Ghaddar | Philippe Langlais | Pascal Poupart
Findings of the Association for Computational Linguistics: NAACL 2022

Intermediate layer knowledge distillation (KD) can improve the standard KD technique (which only targets the output of teacher and student models) especially over large pre-trained language models. However, intermediate layer distillation suffers from excessive computational burdens and engineering efforts required for setting up a proper layer mapping. To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model. This randomized selection enforces that all teacher layers are taken into account in the training process, while reducing the computational cost of intermediate layer distillation. Also, we show that it acts as a regularizer for improving the generalizability of the student model. We perform extensive experiments on GLUE tasks as well as on out-of-domain test sets. We show that our proposed RAIL-KD approach outperforms other state-of-the-art intermediate layer KD methods considerably in both performance and training-time.

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization
Aref Jafari | Ivan Kobyzev | Mehdi Rezagholizadeh | Pascal Poupart | Ali Ghodsi
Findings of the Association for Computational Linguistics: EMNLP 2022

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model’s (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher’s output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).

CILDA: Contrastive Data Augmentation Using Intermediate Layer Knowledge Distillation
Md Akmal Haidar | Mehdi Rezagholizadeh | Abbas Ghaddar | Khalil Bibi | Phillippe Langlais | Pascal Poupart
Proceedings of the 29th International Conference on Computational Linguistics

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models. Recent years have seen a surge of research aiming to improve KD by leveraging Contrastive Learning, Intermediate Layer Distillation, Data Augmentation, and Adversarial Training. In this work, we propose a learning-based data augmentation technique tailored for knowledge distillation, called CILDA. To the best of our knowledge, this is the first time that intermediate layer representations of the main task are used in improving the quality of augmented samples. More precisely, we introduce an augmentation technique for KD based on intermediate layer matching using contrastive loss to improve masked adversarial data augmentation. CILDA outperforms existing state-of-the-art KD approaches on the GLUE benchmark, as well as in an out-of-domain evaluation.


Variational Attention for Sequence-to-Sequence Models
Hareesh Bahuleyan | Lili Mou | Olga Vechtomova | Pascal Poupart
Proceedings of the 27th International Conference on Computational Linguistics

The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.


Deep Active Learning for Dialogue Generation
Nabiha Asghar | Pascal Poupart | Xin Jiang | Hang Li
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We propose an online, end-to-end, neural generative conversational model for open-domain dialogue. It is trained using a unique combination of offline two-phase supervised learning and online human-in-the-loop active learning. While most existing research proposes offline supervision or hand-crafted reward functions for online reinforcement, we devise a novel interactive learning mechanism based on hamming-diverse beam search for response generation and one-character user-feedback at each step. Experiments show that our model inherently promotes the generation of semantically relevant and interesting responses, and can be used to train agents with customized personas, moods and conversational styles.


Overfitting at SemEval-2016 Task 3: Detecting Semantically Similar Questions in Community Question Answering Forums with Word Embeddings
Hujie Wang | Pascal Poupart
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


Generating Lexical Analogies Using Dependency Relations
Andy Chiu | Pascal Poupart | Chrysanne DiMarco
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)


Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management
Jason D. Williams | Pascal Poupart | Steve Young
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue