Carolyne Pelletier


2024

pdf
Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation
Brooklyn Sheppard | Anna Richter | Allison Cohen | Elizabeth Smith | Tamara Kneese | Carolyne Pelletier | Ioana Baldini | Yue Dong
Findings of the Association for Computational Linguistics ACL 2024

Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The open-source dataset can be used for a range of NLP tasks, including binary and multi-label classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, provide baselines for each task using common NLP algorithms, and furnish error analyses to give insight into model behaviour when fine-tuned on the Biasly dataset.

2019

pdf bib
Proposed Taxonomy for Gender Bias in Text; A Filtering Methodology for the Gender Generalization Subtype
Yasmeen Hitti | Eunbee Jang | Ines Moreno | Carolyne Pelletier
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

The purpose of this paper is to present an empirical study on gender bias in text. Current research in this field is focused on detecting and correcting for gender bias in existing machine learning models rather than approaching the issue at the dataset level. The underlying motivation is to create a dataset which could enable machines to learn to differentiate bias writing from non-bias writing. A taxonomy is proposed for structural and contextual gender biases which can manifest themselves in text. A methodology is proposed to fetch one type of structural gender bias, Gender Generalization. We explore the IMDB movie review dataset and 9 different corpora from Project Gutenberg. By filtering out irrelevant sentences, the remaining pool of candidate sentences are sent for human validation. A total of 6123 judgments are made on 1627 sentences and after a quality check on randomly selected sentences we obtain an accuracy of 75%. Out of the 1627 sentences, 808 sentence were labeled as Gender Generalizations. The inter-rater reliability amongst labelers was of 61.14%.