Chiara Manna

2026

Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures
Chiara Manna | Hosein Mohebbi | Afra Alishahi | Frederic Blain | Eva Vanmassenhove
Proceedings of the Fifteenth Language Resources and Evaluation Conference

While Large Language Models achieve state-of-the-art results across a wide range of NLP tasks, they remain prone to systematic biases. Among these, gender bias is particularly salient in MT, due to systematic differences across languages in whether and how gender is marked. As a result, translation often requires disambiguating implicit source signals into explicit gender-marked forms. In this context, standard benchmarks may capture broad disparities but fail to reflect the full complexity of gender bias in modern MT. In this paper, we extend recent frameworks on bias evaluation by: (i) introducing a novel measure coined ’Prior Bias’, capturing a model’s default gender assumptions, and (ii) applying the framework to decoder-only MT models. Our results show that, despite their scale and state-of-the-art status, decoder-only models do not generally outperform encoder-decoder architectures on gender-specific metrics; however, post-training (e.g., instruction tuning) not only improves contextual awareness but also reduces the masculine Prior Bias.

2025

pdf bib

Proceedings of the 3rd Workshop on Gender-Inclusive Translation Technologies (GITT 2025)
Janiça Hackenbuchner | Luisa Bentivogli | Joke Daems | Chiara Manna | Beatrice Savoldi | Eva Vanmassenhove
Proceedings of the 3rd Workshop on Gender-Inclusive Translation Technologies (GITT 2025)

pdf bib abs

Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna | Afra Alishahi | Frédéric Blain | Eva Vanmassenhove
Proceedings of the 3rd Workshop on Gender-Inclusive Translation Technologies (GITT 2025)

While gender bias in modern Neural Machine Translation (NMT) systems has received much attention, the traditional evaluation metrics for these systems do not fully capture the extent to which models integrate contextual gender cues. We propose a novel evaluation metric called Minimal Pair Accuracy (MPA) which measures the reliance of models on gender cues for gender disambiguation. We evaluate a number of NMT models using this metric, we show that they ignore available gender cues in most cases in favour of (statistical) stereotypical gender interpretation. We further show that in anti-stereotypical cases, these models tend to more consistently take male gender cues into account while ignoring the female cues. Finally, we analyze the attention head weights in the encoder component of these models and show that while all models to some extent encode gender information, the male gender cues elicit a more diffused response compared to the more concentrated and specialized responses to female gender cues.

Co-authors

Janiça Hackenbuchner 1

Hosein Mohebbi 1

Beatrice Savoldi 1

Venues

GITT2
LREC1

Fix author