The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Bastian Bunzeck; Sina Zarrieß

doi:10.18653/v1/2024.genbench-1.3

The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns

Abstract

We introduce SlayQA, a novel benchmark data set designed to evaluate language models’ ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.

Anthology ID:: 2024.genbench-1.3
Volume:: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Amirhossein Kazemnejad, Christos Christodoulopoulos, Mario Giulianelli, Ryan Cotterell
Venue:: GenBench
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 42–53
Language:
URL:: https://aclanthology.org/2024.genbench-1.3
DOI:: 10.18653/v1/2024.genbench-1.3
Bibkey:
Cite (ACL):: Bastian Bunzeck and Sina Zarrieß. 2024. The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns. In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP, pages 42–53, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns (Bunzeck & Zarrieß, GenBench 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.genbench-1.3.pdf

PDF Search