@inproceedings{guo-etal-2021-overview,
    title = "An Overview of Uncertainty Calibration for Text Classification and the Role of Distillation",
    author = "Guo, Han  and
      Pasunuru, Ramakanth  and
      Bansal, Mohit",
    editor = "Rogers, Anna  and
      Calixto, Iacer  and
      Vuli{\'c}, Ivan  and
      Saphra, Naomi  and
      Kassner, Nora  and
      Camburu, Oana-Maria  and
      Bansal, Trapit  and
      Shwartz, Vered",
    booktitle = "Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.repl4nlp-1.29/",
    doi = "10.18653/v1/2021.repl4nlp-1.29",
    pages = "289--306",
    abstract = "Recent advances in NLP systems, notably the pretraining-and-finetuning paradigm, have achieved great success in predictive accuracy. However, these systems are usually not well calibrated for uncertainty out-of-the-box. Many recalibration methods have been proposed in the literature for quantifying predictive uncertainty and calibrating model outputs, with varying degrees of complexity. In this work, we present a systematic study of a few of these methods. Focusing on the text classification task and finetuned large pretrained language models, we first show that many of the finetuned models are not well calibrated out-of-the-box, especially when the data come from out-of-domain settings. Next, we compare the effectiveness of a few widely-used recalibration methods (such as ensembles, temperature scaling). Then, we empirically illustrate a connection between distillation and calibration. We view distillation as a regularization term encouraging the student model to output uncertainties that match those of a teacher model. With this insight, we develop simple recalibration methods based on distillation with no additional inference-time cost. We show on the GLUE benchmark that our simple methods can achieve competitive out-of-domain (OOD) calibration performance w.r.t. more expensive approaches. Finally, we include ablations to understand the usefulness of components of our proposed method and examine the transferability of calibration via distillation."
}Markdown (Informal)
[An Overview of Uncertainty Calibration for Text Classification and the Role of Distillation](https://preview.aclanthology.org/ingest-emnlp/2021.repl4nlp-1.29/) (Guo et al., RepL4NLP 2021)
ACL