Region-dependent temperature scaling for certainty calibration and application to class-imbalanced token classification

Hillary Dawkins, Isar Nejadgholi


Abstract
Certainty calibration is an important goal on the path to interpretability and trustworthy AI. Particularly in the context of human-in-the-loop systems, high-quality low to mid-range certainty estimates are essential. In the presence of a dominant high-certainty class, for instance the non-entity class in NER problems, existing calibration error measures are completely insensitive to potentially large errors in this certainty region of interest. We introduce a region-balanced calibration error metric that weights all certainty regions equally. When low and mid certainty estimates are taken into account, calibration error is typically larger than previously reported. We introduce a simple extension of temperature scaling, requiring no additional computation, that can reduce both traditional and region-balanced notions of calibration error over existing baselines.
Anthology ID:
2022.acl-short.59
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
538–544
Language:
URL:
https://aclanthology.org/2022.acl-short.59
DOI:
10.18653/v1/2022.acl-short.59
Bibkey:
Cite (ACL):
Hillary Dawkins and Isar Nejadgholi. 2022. Region-dependent temperature scaling for certainty calibration and application to class-imbalanced token classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 538–544, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Region-dependent temperature scaling for certainty calibration and application to class-imbalanced token classification (Dawkins & Nejadgholi, ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.acl-short.59.pdf
Data
Few-NERD