20Q: Overlap-Free World Knowledge Benchmark for Language Models

Maxime De Bruyn; Ehsan Lotfi; Jeska Buhmann; Walter Daelemans

doi:10.18653/v1/2022.gem-1.46

20Q: Overlap-Free World Knowledge Benchmark for Language Models

Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

Abstract

What do language models know about our world? This question is hard to answer but important to get right. To this end, we introduce 20Q, a novel benchmark using the Twenty Questions game to evaluate world knowledge and common sense of language models. Thanks to our overlap-free benchmark, language models learn the game of Twenty Questions without learning relevant knowledge for the test set. We uncover two intuitive factors influencing the world knowledge of language models: the size of the model and the topic frequency in the pre-training data. Moreover, we show that in-context learning is inefficient for evaluating language models’ world knowledge — fine-tuning is necessary to show their true capabilities. Lastly, our results show room for improvement to enhance the world knowledge and common sense of large language models. A potential solution would be to up-sample unfrequent topics in the pre-training of language models.

Anthology ID:: 2022.gem-1.46
Volume:: Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
Venue:: GEM
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 494–508
Language:
URL:: https://aclanthology.org/2022.gem-1.46
DOI:: 10.18653/v1/2022.gem-1.46
Bibkey:
Cite (ACL):: Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, and Walter Daelemans. 2022. 20Q: Overlap-Free World Knowledge Benchmark for Language Models. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 494–508, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: 20Q: Overlap-Free World Knowledge Benchmark for Language Models (De Bruyn et al., GEM 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2022.gem-1.46.pdf

PDF Search