Abstract
We present a new dataset consisting of various quantifier expressions to evaluate the generalization abilities of language models. The dataset contains 18,360 prompts encompassing diverse quantifiers, forming the basis of a new framework for assessing semantic understanding in this domain. We test the effectiveness of our dataset using Pythia models, ranging from 410 million to 6.9 billion, showing that quantifier-based tasks can be challenging for current language models. We make our code and data publicly available, such that the dataset can be easily extended or updated based on different evaluation needs.- Anthology ID:
- 2023.genbench-1.15
- Volume:
- Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Dieuwke Hupkes, Verna Dankers, Khuyagbaatar Batsuren, Koustuv Sinha, Amirhossein Kazemnejad, Christos Christodoulopoulos, Ryan Cotterell, Elia Bruni
- Venues:
- GenBench | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 185–192
- Language:
- URL:
- https://aclanthology.org/2023.genbench-1.15
- DOI:
- 10.18653/v1/2023.genbench-1.15
- Cite (ACL):
- Leroy Zhifei Wang and Shane Steinert-Threlkeld. 2023. GQG: Generalized Quantifier Generalization - A Dataset for Evaluating Quantifier Semantics Understanding in Language Models. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 185–192, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- GQG: Generalized Quantifier Generalization - A Dataset for Evaluating Quantifier Semantics Understanding in Language Models (Zhifei Wang & Steinert-Threlkeld, GenBench-WS 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.genbench-1.15.pdf