Abstract
Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun “hen”, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize “hen” as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline. Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying “hen” as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.- Anthology ID:
- 2023.ltedi-1.8
- Volume:
- Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Bharathi R. Chakravarthi, B. Bharathi, Joephine Griffith, Kalika Bali, Paul Buitelaar
- Venues:
- LTEDI | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 54–61
- Language:
- URL:
- https://aclanthology.org/2023.ltedi-1.8
- DOI:
- Cite (ACL):
- Henrik Björklund and Hannah Devinney. 2023. Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish. In Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion, pages 54–61, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish (Björklund & Devinney, LTEDI-WS 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.ltedi-1.8.pdf