Terne Sasha Thorn Jakobsen


The Sensitivity of Annotator Bias to Task Definitions in Argument Mining
Terne Sasha Thorn Jakobsen | Maria Barrett | Anders Søgaard | David Lassen
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

NLP models are dependent on the data they are trained on, including how this data is annotated. NLP research increasingly examines the social biases of models, but often in the light of their training data and specific social biases that can be identified in the text itself. In this paper, we present an annotation experiment that is the first to examine the extent to which social bias is sensitive to how data is annotated. We do so by collecting annotations of arguments in the same documents following four different guidelines and from four different demographic annotator backgrounds. We show that annotations exhibit widely different levels of group disparity depending on which guidelines annotators follow. The differences are not explained by task complexity, but rather by characteristics of these demographic groups, as previously identified by sociological studies. We release a dataset that is small in the number of instances but large in the number of annotations with demographic information, and our results encourage an increased awareness of annotator bias.


Spurious Correlations in Cross-Topic Argument Mining
Terne Sasha Thorn Jakobsen | Maria Barrett | Anders Søgaard
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Recent work in cross-topic argument mining attempts to learn models that generalise across topics rather than merely relying on within-topic spurious correlations. We examine the effectiveness of this approach by analysing the output of single-task and multi-task models for cross-topic argument mining, through a combination of linear approximations of their decision boundaries, manual feature grouping, challenge examples, and ablations across the input vocabulary. Surprisingly, we show that cross-topic models still rely mostly on spurious correlations and only generalise within closely related topics, e.g., a model trained only on closed-class words and a few common open-class words outperforms a state-of-the-art cross-topic model on distant target topics.