GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective

Linyi Yang; Shuibai Zhang; Libo Qin; Yafu Li; Yidong Wang; Hanmeng Liu; Jindong Wang; Xing Xie; Yue Zhang

doi:10.18653/v1/2023.findings-acl.806

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective

Linyi Yang, Shuibai Zhang, Libo Qin, Yafu Li, Yidong Wang, Hanmeng Liu, Jindong Wang, Xing Xie, Yue Zhang

Abstract

Pre-trained language models (PLMs) are known to improve the generalization performance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first attempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, highlighting the importance of OOD robustness and providing insights on how to measure the robustness of a model and how to improve it. The benchmark includes 13 publicly available datasets for OOD testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly used PLMs. Our findings confirm the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy.

Anthology ID:: 2023.findings-acl.806
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12731–12750
Language:
URL:: https://aclanthology.org/2023.findings-acl.806
DOI:: 10.18653/v1/2023.findings-acl.806
Bibkey:
Cite (ACL):: Linyi Yang, Shuibai Zhang, Libo Qin, Yafu Li, Yidong Wang, Hanmeng Liu, Jindong Wang, Xing Xie, and Yue Zhang. 2023. GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12731–12750, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective (Yang et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-acl.806.pdf

PDF Search