Khalil Bibi


2022

pdf
CILDA: Contrastive Data Augmentation Using Intermediate Layer Knowledge Distillation
Md Akmal Haidar | Mehdi Rezagholizadeh | Abbas Ghaddar | Khalil Bibi | Phillippe Langlais | Pascal Poupart
Proceedings of the 29th International Conference on Computational Linguistics

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models. Recent years have seen a surge of research aiming to improve KD by leveraging Contrastive Learning, Intermediate Layer Distillation, Data Augmentation, and Adversarial Training. In this work, we propose a learning-based data augmentation technique tailored for knowledge distillation, called CILDA. To the best of our knowledge, this is the first time that intermediate layer representations of the main task are used in improving the quality of augmented samples. More precisely, we introduce an augmentation technique for KD based on intermediate layer matching using contrastive loss to improve masked adversarial data augmentation. CILDA outperforms existing state-of-the-art KD approaches on the GLUE benchmark, as well as in an out-of-domain evaluation.

2021

pdf
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj | Abbas Ghaddar | Ahmad Rashid | Khalil Bibi | Chengyang Li | Ali Ghodsi | Phillippe Langlais | Mehdi Rezagholizadeh
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications. However, one neglected area of research is the impact of noisy (corrupted) labels on KD. We present, to the best of our knowledge, the first study on KD with noisy labels in Natural Language Understanding (NLU). We document the scope of the problem and present two methods to mitigate the impact of label noise. Experiments on the GLUE benchmark show that our methods are effective even under high noise levels. Nevertheless, our results indicate that more research is necessary to cope with label noise under the KD.