Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing
Lucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek, Pierre Zweigenbaum
Abstract
Most of the time, when dealing with a particular Natural Language Processing task, systems are compared on the basis of global statistics such as recall, precision, F1-score, etc. While such scores provide a general idea of the behavior of these systems, they ignore a key piece of information that can be useful for assessing progress and discerning remaining challenges: the relative difficulty of test instances. To address this shortcoming, we introduce the notion of differential evaluation which effectively defines a pragmatic partition of instances into gradually more difficult bins by leveraging the predictions made by a set of systems. Comparing systems along these difficulty bins enables us to produce a finer-grained analysis of their relative merits, which we illustrate on two use-cases: a comparison of systems participating in a multi-label text classification task (CLEF eHealth 2018 ICD-10 coding), and a comparison of neural models trained for biomedical entity detection (BioCreative V chemical-disease relations dataset).- Anthology ID:
- 2021.eval4nlp-1.1
- Volume:
- Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Venue:
- Eval4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–10
- Language:
- URL:
- https://aclanthology.org/2021.eval4nlp-1.1
- DOI:
- 10.18653/v1/2021.eval4nlp-1.1
- Cite (ACL):
- Lucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek, and Pierre Zweigenbaum. 2021. Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 1–10, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing (Gianola et al., Eval4NLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.eval4nlp-1.1.pdf
- Data
- MIMIC-III