Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček
- Anthology ID:
- 2023.ijcnlp-main.57
- Volume:
- Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2023
- Address:
- Nusa Dua, Bali
- Editors:
- Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, Adila Alfa Krisnadhi
- Venues:
- IJCNLP | AACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 885–896
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2023.ijcnlp-main.57/
- DOI:
- 10.18653/v1/2023.ijcnlp-main.57
- Cite (ACL):
- Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, and David Mareček. 2023. Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 885–896, Nusa Dua, Bali. Association for Computational Linguistics.
- Cite (Informal):
- Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation (Iluz et al., IJCNLP-AACL 2023)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2023.ijcnlp-main.57.pdf