Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models

Steinunn Rut Friðriksdóttir; Hafsteinn Einarsson

Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models

Steinunn Rut Friðriksdóttir, Hafsteinn Einarsson

Abstract

Large language models, trained on vast datasets, exhibit increased output quality in proportion to the amount of data that is used to train them. This data-driven learning process has brought forth a pressing issue where these models may not only reflect but also amplify gender bias, racism, religious prejudice, and queerphobia present in their training data that may not always be recent. This study explores gender bias in language models trained on Icelandic, focusing on occupation-related terms. Icelandic is a highly grammatically gendered language that favors the masculine when referring to groups of people with indeterminable genders. Our aim is to explore whether language models merely mirror gender distributions within the corresponding professions or if they exhibit biases tied to their grammatical genders. Results indicate a significant overall predisposition towards the masculine but specific occupation terms consistently lean toward a particular gender, indicating complex interplays of societal and linguistic influences.

Anthology ID:: 2024.lrec-main.671
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 7596–7610
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.lrec-main.671/
DOI:
Bibkey:
Cite (ACL):: Steinunn Rut Friðriksdóttir and Hafsteinn Einarsson. 2024. Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7596–7610, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Gendered Grammar or Ingrained Bias? Exploring Gender Bias in Icelandic Language Models (Friðriksdóttir & Einarsson, LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.lrec-main.671.pdf

PDF Cite Search Fix data