Abstract
This paper introduces a taxonomy of phenomena which cause bias in machine translation, covering gender bias (people being male and/or female), number bias (singular you versus plural you) and formality bias (informal you versus formal you). Our taxonomy is a formalism for describing situations in machine translation when the source text leaves some of these properties unspecified (eg. does not say whether doctor is male or female) but the target language requires the property to be specified (eg. because it does not have a gender-neutral word for doctor). The formalism described here is used internally by a web-based tool we have built for detecting and correcting bias in the output of any machine translator.- Anthology ID:
- 2022.gebnlp-1.18
- Volume:
- Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington
- Editors:
- Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, Hila Gonen
- Venue:
- GeBNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 168–173
- Language:
- URL:
- https://aclanthology.org/2022.gebnlp-1.18
- DOI:
- 10.18653/v1/2022.gebnlp-1.18
- Cite (ACL):
- Michal Měchura. 2022. A Taxonomy of Bias-Causing Ambiguities in Machine Translation. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 168–173, Seattle, Washington. Association for Computational Linguistics.
- Cite (Informal):
- A Taxonomy of Bias-Causing Ambiguities in Machine Translation (Měchura, GeBNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.gebnlp-1.18.pdf