Nathan Andrew Chi


2025

pdf bib
ModeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models
Nathan Andrew Chi | Teodor Malchev | Riley Kong | Ryan Andrew Chi | Lucas Huang | Ethan A Chi | R. Thomas McCoy | Dragomir Radev
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)

We introduce ModeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems. Solving these puzzles necessitates inferring aspects of a language’s grammatical structure from a small number of examples. Such puzzles provide a natural testbed for language models, as they require compositional generalization and few-shot inductive reasoning. Consisting solely of new puzzles written specifically for this work, ModeLing has no risk of appearing in the training data of existing AI systems: this ameliorates the risk of data leakage, a potential confounder for many prior evaluations of reasoning. Evaluating several large open source language models and GPT on our benchmark, we observe non-negligible accuracy, demonstrating few-shot emergent reasoning ability which cannot merely be attributed to shallow memorization. However, imperfect model performance suggests that ModeLing can be used to measure further progress in linguistic reasoning.