Gender Bias Mitigation for NMT Involving Genderless Languages

Ander Corral; Xabier Saralegi

Gender Bias Mitigation for NMT Involving Genderless Languages

Abstract

It has been found that NMT systems have a strong preference towards social defaults and biases when translating certain occupations, which due to their widespread use, can unintentionally contribute to amplifying and perpetuating these patterns. In that sense, this work focuses on sentence-level gender agreement between gendered entities and occupations when translating from genderless languages to languages with grammatical gender. Specifically, we address the Basque to Spanish translation direction for which bias mitigation has not been addressed. Gender information in Basque is explicit in neither the grammar nor the morphology. It is only present in a limited number of gender specific common nouns and person proper names. We propose a template-based fine-tuning strategy with explicit gender tags to provide a stronger gender signal for the proper inflection of occupations. This strategy is compared against systems fine-tuned on real data extracted from Wikipedia biographies. We provide a detailed gender bias assessment analysis and perform a template ablation study to determine the optimal set of templates. We report a substantial gender bias mitigation (up to 50% on gender bias scores) while keeping the original translation quality.

Anthology ID:: 2022.wmt-1.10
Volume:: Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 165–176
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2022.wmt-1.10/
DOI:
Bibkey:
Cite (ACL):: Ander Corral and Xabier Saralegi. 2022. Gender Bias Mitigation for NMT Involving Genderless Languages. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 165–176, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Gender Bias Mitigation for NMT Involving Genderless Languages (Corral & Saralegi, WMT 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2022.wmt-1.10.pdf
Video:: https://preview.aclanthology.org/ingest-emnlp/2022.wmt-1.10.mp4

PDF Cite Search Video Fix data