Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm

Ziyue Liu, Nils Reiter


Abstract
Evaluating the grammatical abilities of large language models (LLMs) is important for both NLP and linguistic theory. We investigate the ability of large language models (LLMs) to perform acceptability judgments in a forced-choice paradigm. We evaluate a subset of LLMs on 150 minimal sentence pairs sampled from Linguistic Inquiry and categorized using BLiMP linguistic phenomena. Our results show that while LLMs approximate human judgments, performance varies across models and phenomenon types, with stronger alignment on morphosyntactic phenomena than on linguistically and semantically demanding phenomena. Prompting strategies have minimal impact.
Anthology ID:
2026.acl-srw.103
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1177–1189
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.103/
DOI:
Bibkey:
Cite (ACL):
Ziyue Liu and Nils Reiter. 2026. Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1177–1189, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm (Liu & Reiter, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.103.pdf