Leveraging Pre-trained Language Models for Gender Debiasing

Nishtha Jain, Declan Groves, Lucia Specia, Maja Popović


Abstract
Studying and mitigating gender and other biases in natural language have become important areas of research from both algorithmic and data perspectives. This paper explores the idea of reducing gender bias in a language generation context by generating gender variants of sentences. Previous work in this field has either been rule-based or required large amounts of gender balanced training data. These approaches are however not scalable across multiple languages, as creating data or rules for each language is costly and time-consuming. This work explores a light-weight method to generate gender variants for a given text using pre-trained language models as the resource, without any task-specific labelled data. The approach is designed to work on multiple languages with minimal changes in the form of heuristics. To showcase that, we have tested it on a high-resourced language, namely Spanish, and a low-resourced language from a different family, namely Serbian. The approach proved to work very well on Spanish, and while the results were less positive for Serbian, it showed potential even for languages where pre-trained models are less effective.
Anthology ID:
2022.lrec-1.235
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2188–2195
Language:
URL:
https://aclanthology.org/2022.lrec-1.235
DOI:
Bibkey:
Cite (ACL):
Nishtha Jain, Declan Groves, Lucia Specia, and Maja Popović. 2022. Leveraging Pre-trained Language Models for Gender Debiasing. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2188–2195, Marseille, France. European Language Resources Association.
Cite (Informal):
Leveraging Pre-trained Language Models for Gender Debiasing (Jain et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.235.pdf