ModeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models
Nathan Andrew Chi, Teodor Malchev, Riley Kong, Ryan Andrew Chi, Lucas Huang, Ethan A Chi, R. Thomas McCoy, Dragomir Radev
Abstract
We introduce ModeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language’s grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting solely of new puzzles written specifically for this work, ModeLing has no risk of appearing in the training data of existing AI systems: this ameliorates the risk of data leakage, a potential confounder for many prior evaluations of reasoning. Evaluating several large open source language models and GPT on our benchmark, we observe non-negligible accuracy, demonstrating few-shot emergent reasoning ability which cannot merely be attributed to shallow memorization. However, imperfect model performance suggests that ModeLing can be used to measure further progress in linguistic reasoning.- Anthology ID:
- 2025.loresmt-1.10
- Volume:
- Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, New Mexico, U.S.A.
- Editors:
- Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
- Venues:
- LoResMT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 105–114
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.loresmt-1.10/
- DOI:
- Cite (ACL):
- Nathan Andrew Chi, Teodor Malchev, Riley Kong, Ryan Andrew Chi, Lucas Huang, Ethan A Chi, R. Thomas McCoy, and Dragomir Radev. 2025. ModeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models. In Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025), pages 105–114, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
- Cite (Informal):
- ModeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models (Chi et al., LoResMT 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.loresmt-1.10.pdf