Abstract
Misrepresentation of certain communities in datasets is causing big disruptions in artificial intelligence applications. In this paper, we propose using an automatically extracted gender-balanced dataset parallel corpus from Wikipedia. This balanced set is used to perform fine-tuning techniques from a bigger model trained on unbalanced datasets to mitigate gender biases in neural machine translation.- Anthology ID:
- 2020.gebnlp-1.3
- Volume:
- Proceedings of the Second Workshop on Gender Bias in Natural Language Processing
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster
- Venue:
- GeBNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26–34
- Language:
- URL:
- https://aclanthology.org/2020.gebnlp-1.3
- DOI:
- Cite (ACL):
- Marta R. Costa-jussà and Adrià de Jorge. 2020. Fine-tuning Neural Machine Translation on Gender-Balanced Datasets. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 26–34, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- Fine-tuning Neural Machine Translation on Gender-Balanced Datasets (Costa-jussà & de Jorge, GeBNLP 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.gebnlp-1.3.pdf