Sebastiano Vecellio Salto

Also published as: Sebastiano Vecellio Salto

2025

pdf bib abs
Job Unfair: An Investigation of Gender and Occupational Bias in Free-Form Text Completions by LLMs
Camilla Casula | Sebastiano Vecellio Salto | Elisa Leonardelli | Sara Tonelli
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Disentangling how gender and occupations are encoded by LLMs is crucial to identify possible biases and prevent harms, especially given the widespread use of LLMs in sensitive domains such as human resources.In this work, we carry out an in-depth investigation of gender and occupational biases in English and Italian as expressed by 9 different LLMs (both base and instruction-tuned). Specifically, we focus on the analysis of sentence completions when LLMs are prompted with job-related sentences including different gender representations. We carry out a manual analysis of 4,500 generated texts over 4 dimensions that can reflect bias, we propose a novel embedding-based method to investigate biases in generated texts and, finally, we carry out a lexical analysis of the model completions. In our qualitative and quantitative evaluation we show that many facets of social bias remain unaccounted for even in aligned models, and LLMs in general still reflect existing gender biases in both languages. Finally, we find that models still struggle with gender-neutral expressions, especially beyond English.

2024

pdf bib abs
Delving into Qualitative Implications of Synthetic Data for Hate Speech Detection
Camilla Casula | Sebastiano Vecellio Salto | Alan Ramponi | Sara Tonelli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The use of synthetic data for training models for a variety of NLP tasks is now widespread. However, previous work reports mixed results with regards to its effectiveness on highly subjective tasks such as hate speech detection. In this paper, we present an in-depth qualitative analysis of the potential and specific pitfalls of synthetic data for hate speech detection in English, with 3,500 manually annotated examples. We show that, across different models, synthetic data created through paraphrasing gold texts can improve out-of-distribution robustness from a computational standpoint. However, this comes at a cost: synthetic data fails to reliably reflect the characteristics of real-world data on a number of linguistic dimensions, it results in drastically different class distributions, and it heavily reduces the representation of both specific identity groups and intersectional hate.

Co-authors

Venues

emnlp2

Fix author