Linguistic Blind Spots of Large Language Models

Jiali Cheng; Hadi Amiri

Linguistic Blind Spots of Large Language Models

Abstract

Large language models (LLMs) serve as the foundation of numerous AI applications today. However, despite their remarkable proficiency in generating coherent text, questions linger regarding their ability in performing fine-grained linguistic annotation tasks, such as detecting nouns or verbs, or identifying more complex syntactic structures like clauses or T-units in input texts. These tasks require precise syntactic and semantic understanding of input text, and when LLMs underperform on specific linguistic structures, it raises concerns about their reliability for detailed linguistic analysis and whether their (even correct) outputs truly reflect an understanding of the inputs. In this paper, we empirically study recent LLMs performance across fine-grained linguistic annotation tasks. Through a series of experiments, we find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. We show that the most capable LLM (Llama3-70b) makes notable errors in detecting linguistic structures, such as misidentifying embedded clauses, failing to recognize verb phrases, and confusing complex nominals with clauses. Our study provides valuable insights to inform future endeavors in LLM design and development.

Anthology ID:: 2025.cmcl-1.3
Volume:: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:: CMCL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–17
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.3/
DOI:
Bibkey:
Cite (ACL):: Jiali Cheng and Hadi Amiri. 2025. Linguistic Blind Spots of Large Language Models. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 1–17, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Linguistic Blind Spots of Large Language Models (Cheng & Amiri, CMCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.3.pdf

PDF Cite Search Fix data