GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics
Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl
Abstract
Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of class-wise hardness. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose GeoHard for class-wise hardness measurement by modeling class geometry in the semantic embedding space. GeoHard surpasses instance-level metrics by over 59 percent on Pearson‘s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of GeoHard as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.- Anthology ID:
- 2024.findings-acl.332
- Volume:
- Findings of the Association for Computational Linguistics ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand and virtual meeting
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5571–5597
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.332
- DOI:
- Cite (ACL):
- Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, and Heinz Koeppl. 2024. GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics. In Findings of the Association for Computational Linguistics ACL 2024, pages 5571–5597, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics (Cai et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.findings-acl.332.pdf