Posterior Calibrated Training on Sentence Classification Tasks

Taehee Jung, Dongyeop Kang, Hua Cheng, Lucas Mentch, Thomas Schaaf


Abstract
Most classification models work by first predicting a posterior probability distribution over all classes and then selecting that class with the largest estimated probability. In many settings however, the quality of posterior probability itself (e.g., 65% chance having diabetes), gives more reliable information than the final predicted class alone. When these methods are shown to be poorly calibrated, most fixes to date have relied on posterior calibration, which rescales the predicted probabilities but often has little impact on final classifications. Here we propose an end-to-end training procedure called posterior calibrated (PosCal) training that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities. We show that PosCal not only helps reduce the calibration error but also improve task performance by penalizing drops in performance of both objectives. Our PosCal achieves about 2.5% of task performance gain and 16.1% of calibration error reduction on GLUE (Wang et al., 2018) compared to the baseline. We achieved the comparable task performance with 13.2% calibration error reduction on xSLUE (Kang and Hovy, 2019), but not outperforming the two-stage calibration baseline. PosCal training can be easily extendable to any types of classification tasks as a form of regularization term. Also, PosCal has the advantage that it incrementally tracks needed statistics for the calibration objective during the training process, making efficient use of large training sets.
Anthology ID:
2020.acl-main.242
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2723–2730
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2020.acl-main.242/
DOI:
10.18653/v1/2020.acl-main.242
Bibkey:
Cite (ACL):
Taehee Jung, Dongyeop Kang, Hua Cheng, Lucas Mentch, and Thomas Schaaf. 2020. Posterior Calibrated Training on Sentence Classification Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2723–2730, Online. Association for Computational Linguistics.
Cite (Informal):
Posterior Calibrated Training on Sentence Classification Tasks (Jung et al., ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2020.acl-main.242.pdf
Video:
 http://slideslive.com/38929102
Code
 THEEJUNG/PosCal
Data
GLUE