Abstract
Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences. To address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender information of the articles’ authors, and build taggers and parsers trained on this data that show performance differences in text written by men and women. Further analyses reveal numerous part-of-speech tags and syntactic relations whose prediction performances benefit from the prevalence of a specific gender in the training data. The results underscore the importance of accounting for gendered differences in syntactic tasks, and outline future venues for developing more accurate taggers and parsers. We release our data to the research community.- Anthology ID:
- P19-1339
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3493–3498
- Language:
- URL:
- https://aclanthology.org/P19-1339
- DOI:
- 10.18653/v1/P19-1339
- Cite (ACL):
- Aparna Garimella, Carmen Banea, Dirk Hovy, and Rada Mihalcea. 2019. Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3493–3498, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing (Garimella et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl24-info/P19-1339.pdf
- Data
- Penn Treebank