Abstract
We investigate model calibration in the setting of zero-shot cross-lingual transfer with large-scale pre-trained language models. The level of model calibration is an important metric for evaluating the trustworthiness of predictive models. There exists an essential need for model calibration when natural language models are deployed in critical tasks. We study different post-training calibration methods in structured and unstructured prediction tasks. We find that models trained with data from the source language become less calibrated when applied to the target language and that calibration errors increase with intrinsic task difficulty and relative sparsity of training data. Moreover, we observe a potential connection between the level of calibration error and an earlier proposed measure of the distance from English to other languages. Finally, our comparison demonstrates that among other methods Temperature Scaling (TS) generalizes well to distant languages, but TS fails to calibrate more complex confidence estimation in structured predictions compared to more expressive alternatives like Gaussian Process Calibration.- Anthology ID:
- 2022.emnlp-main.170
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2648–2674
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.170
- DOI:
- 10.18653/v1/2022.emnlp-main.170
- Cite (ACL):
- Zhengping Jiang, Anqi Liu, and Benjamin Van Durme. 2022. Calibrating Zero-shot Cross-lingual (Un-)structured Predictions. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2648–2674, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Calibrating Zero-shot Cross-lingual (Un-)structured Predictions (Jiang et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2022.emnlp-main.170.pdf