GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl


Abstract
Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of class-wise hardness. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose GeoHard for class-wise hardness measurement by modeling class geometry in the semantic embedding space. GeoHard surpasses instance-level metrics by over 59 percent on Pearson‘s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of GeoHard as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.
Anthology ID:
2024.findings-acl.332
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5571–5597
Language:
URL:
https://aclanthology.org/2024.findings-acl.332
DOI:
Bibkey:
Cite (ACL):
Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, and Heinz Koeppl. 2024. GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics. In Findings of the Association for Computational Linguistics ACL 2024, pages 5571–5597, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics (Cai et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.findings-acl.332.pdf