Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

Shane Storks, Joyce Chai


Abstract
As large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks, statistical bias in benchmark data and probing studies have recently called into question their true capabilities. For a more informative evaluation than accuracy on text classification tasks can offer, we propose evaluating systems through a novel measure of prediction coherence. We apply our framework to two existing language understanding benchmarks with different properties to demonstrate its versatility. Our experimental results show that this evaluation framework, although simple in ideas and implementation, is a quick, effective, and versatile measure to provide insight into the coherence of machines’ predictions.
Anthology ID:
2021.findings-emnlp.272
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3169–3177
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.272
DOI:
10.18653/v1/2021.findings-emnlp.272
Bibkey:
Cite (ACL):
Shane Storks and Joyce Chai. 2021. Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3169–3177, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers (Storks & Chai, Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2021.findings-emnlp.272.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2021.findings-emnlp.272.mp4
Code
 sled-group/verifiable-coherent-nlu
Data
MultiNLI