A Robustness Evaluation Framework for Argument Mining

Mehmet Sofi, Matteo Fortier, Oana Cocarascu


Abstract
Standard practice for evaluating the performance of machine learning models for argument mining is to report different metrics such as accuracy or F1. However, little is usually known about the model’s stability and consistency when deployed in real-world settings. In this paper, we propose a robustness evaluation framework to guide the design of rigorous argument mining models. As part of the framework, we introduce several novel robustness tests tailored specifically to argument mining tasks. Additionally, we integrate existing robustness tests designed for other natural language processing tasks and re-purpose them for argument mining. Finally, we illustrate the utility of our framework on two widely used argument mining corpora, UKP topic-sentences and IBM Debater Evidence Sentence. We argue that our framework should be used in conjunction with standard performance evaluation techniques as a measure of model stability.
Anthology ID:
2022.argmining-1.16
Volume:
Proceedings of the 9th Workshop on Argument Mining
Month:
October
Year:
2022
Address:
Online and in Gyeongju, Republic of Korea
Editors:
Gabriella Lapesa, Jodi Schneider, Yohan Jo, Sougata Saha
Venue:
ArgMining
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
171–180
Language:
URL:
https://aclanthology.org/2022.argmining-1.16
DOI:
Bibkey:
Cite (ACL):
Mehmet Sofi, Matteo Fortier, and Oana Cocarascu. 2022. A Robustness Evaluation Framework for Argument Mining. In Proceedings of the 9th Workshop on Argument Mining, pages 171–180, Online and in Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):
A Robustness Evaluation Framework for Argument Mining (Sofi et al., ArgMining 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2022.argmining-1.16.pdf