Abstract
We introduce SlayQA, a novel benchmark data set designed to evaluate language models’ ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.- Anthology ID:
- 2024.genbench-1.3
- Volume:
- Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, Ryan Cotterell
- Venue:
- GenBench
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 42–53
- Language:
- URL:
- https://aclanthology.org/2024.genbench-1.3
- DOI:
- 10.18653/v1/2024.genbench-1.3
- Cite (ACL):
- Bastian Bunzeck and Sina Zarrieß. 2024. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, pages 42–53, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns (Bunzeck & Zarrieß, GenBench 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.genbench-1.3.pdf