Adrià de Jorge


Fine-tuning Neural Machine Translation on Gender-Balanced Datasets
Marta R. Costa-jussà | Adrià de Jorge
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Misrepresentation of certain communities in datasets is causing big disruptions in artificial intelligence applications. In this paper, we propose using an automatically extracted gender-balanced dataset parallel corpus from Wikipedia. This balanced set is used to perform fine-tuning techniques from a bigger model trained on unbalanced datasets to mitigate gender biases in neural machine translation.