The Viability of Best-worst Scaling and Categorical Data Label Annotation Tasks in Detecting Implicit Bias

Parker Glenn; Cassandra L. Jacobs; Marvin Thielk; Yi Chu

The Viability of Best-worst Scaling and Categorical Data Label Annotation Tasks in Detecting Implicit Bias

Parker Glenn, Cassandra L. Jacobs, Marvin Thielk, Yi Chu

Abstract

Annotating workplace bias in text is a noisy and subjective task. In encoding the inherently continuous nature of bias, aggregated binary classifications do not suffice. Best-worst scaling (BWS) offers a framework to obtain real-valued scores through a series of comparative evaluations, but it is often impractical to deploy to traditional annotation pipelines within industry. We present analyses of a small-scale bias dataset, jointly annotated with categorical annotations and BWS annotations. We show that there is a strong correlation between observed agreement and BWS score (Spearman’s r=0.72). We identify several shortcomings of BWS relative to traditional categorical annotation: (1) When compared to categorical annotation, we estimate BWS takes approximately 4.5x longer to complete; (2) BWS does not scale well to large annotation tasks with sparse target phenomena; (3) The high correlation between BWS and the traditional task shows that the benefits of BWS can be recovered from a simple categorically annotated, non-aggregated dataset.

Anthology ID:: 2022.nlperspectives-1.5
Volume:: Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
Month:: June
Year:: 2022
Address:: Marseille, France
Venue:: NLPerspectives
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 32–36
Language:
URL:: https://aclanthology.org/2022.nlperspectives-1.5
DOI:
Bibkey:
Cite (ACL):: Parker Glenn, Cassandra L. Jacobs, Marvin Thielk, and Yi Chu. 2022. The Viability of Best-worst Scaling and Categorical Data Label Annotation Tasks in Detecting Implicit Bias. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 32–36, Marseille, France. European Language Resources Association.
Cite (Informal):: The Viability of Best-worst Scaling and Categorical Data Label Annotation Tasks in Detecting Implicit Bias (Glenn et al., NLPerspectives 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/paclic-22-ingestion/2022.nlperspectives-1.5.pdf

PDF Search