George Karypis


2021

pdf bib
Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing
Haoyu He | Xingjian Shi | Jonas Mueller | Sheng Zha | Mu Li | George Karypis
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing

Knowledge Distillation (KD) offers a natural way to reduce the latency and memory/energy usage of massive pretrained models that have come to dominate Natural Language Processing (NLP) in recent years. While numerous sophisticated variants of KD algorithms have been proposed for NLP applications, the key factors underpinning the optimal distillation performance are often confounded and remain unclear. We aim to identify how different components in the KD pipeline affect the resulting performance and how much the optimal KD pipeline varies across different datasets/tasks, such as the data augmentation policy, the loss function, and the intermediate representation for transferring the knowledge between teacher and student. To tease apart their effects, we propose Distiller, a meta KD framework that systematically combines a broad range of techniques across different stages of the KD pipeline, which enables us to quantify each component’s contribution. Within Distiller, we unify commonly used objectives for distillation of intermediate representations under a universal mutual information (MI) objective and propose a class of MI-objective functions with better bias/variance trade-off for estimating the MI between the teacher and the student. On a diverse set of NLP datasets, the best Distiller configurations are identified via large-scale hyper-parameter optimization. Our experiments reveal the following: 1) the approach used to distill the intermediate representations is the most important factor in KD performance, 2) among different objectives for intermediate distillation, MI-performs the best, and 3) data augmentation provides a large boost for small training datasets or small student networks. Moreover, we find that different datasets/tasks prefer different KD algorithms, and thus propose a simple AutoDistiller algorithm that can recommend a good KD pipeline for a new dataset.

pdf bib
Evaluating Scholarly Impact: Towards Content-Aware Bibliometrics
Saurav Manchanda | George Karypis
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Quantitatively measuring the impact-related aspects of scientific, engineering, and technological (SET) innovations is a fundamental problem with broad applications. Traditional citation-based measures for assessing the impact of innovations and related entities do not take into account the content of the publications. This limits their ability to provide rigorous quality-related metrics because they cannot account for the reasons that led to a citation. We present approaches to estimate content-aware bibliometrics to quantitatively measure the scholarly impact of a publication. Our approaches assess the impact of a cited publication by the extent to which the cited publication informs the citing publication. We introduce a new metric, called “Content Informed Index” (CII), that uses the content of the paper as a source of distant-supervision, to quantify how much the cited-node informs the citing-node. We evaluate the weights estimated by our approach on three manually annotated datasets, where the annotations quantify the extent of information in the citation. Particularly, we evaluate how well the ranking imposed by our approach associates with the ranking imposed by the manual annotations. CII achieves up to 103% improvement in performance as compared to the second-best performing approach.