Aligned Weight Regularizers for Pruning Pretrained Neural Networks
James O’ Neill
Findings of the Association for Computational Linguistics: ACL 2022
Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel self-distillation based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.
Do not let the history haunt you: Mitigating Compounding Errors in Conversational Question Answering
James O’ Neill
Proceedings of the Twelfth Language Resources and Evaluation Conference
The Conversational Question Answering (CoQA) task involves answering a sequence of inter-related conversational questions about a contextual paragraph. Although existing approaches employ human-written ground-truth answers for answering conversational questions at test time, in a realistic scenario, the CoQA model will not have any access to ground-truth answers for the previous questions, compelling the model to rely upon its own previously predicted answers for answering the subsequent questions. In this paper, we find that compounding errors occur when using previously predicted answers at test time, significantly lowering the performance of CoQA systems. To solve this problem, we propose a sampling strategy that dynamically selects between target answers and model predictions during training, thereby closely simulating the situation at test time. Further, we analyse the severity of this phenomena as a function of the question type, conversation length and domain type.
NUIG at EmoInt-2017: BiLSTM and SVR Ensemble to Detect Emotion Intensity
James O’ Neill
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
This paper describes the entry NUIG in the WASSA 2017 (8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis) shared task on emotion recognition. The NUIG system used an SVR (SVM regression) and BLSTM ensemble, utilizing primarily n-grams (for SVR features) and tweet word embeddings (for BLSTM features). Experiments were carried out on several other candidate features, some of which were added to the SVR model. Parameter selection for the SVR model was run as a grid search whilst parameters for the BLSTM model were selected through a non-exhaustive ad-hoc search.