Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

Michael Li; Nishant Subramani

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

Abstract

Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information relies primarily on studies of early models like BERT and GPT-2. We systematically probe 25 models from BERT Base to Qwen2.5-7B focusing on two linguistic properties: lexical identity and inflectional features across 6 diverse languages. We find a consistent pattern: inflectional features are linearly decodable throughout the model, while lexical identity is prominent early but increasingly weakens with depth. Further analysis of the representation geometry reveals that models with aggressive mid-layer dimensionality compression show reduced steering effectiveness in those layers, despite probe accuracy remaining high. Pretraining analysis shows that inflectional structure stabilizes early while lexical identity representations continue evolving. Taken together, our findings suggest that transformers maintain inflectional features across layers, while trading off lexical identity for compact, predictive representations.

Anthology ID:: 2026.acl-long.720
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15825–15864
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.720/
DOI:
Bibkey:
Cite (ACL):: Michael Li and Nishant Subramani. 2026. Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15825–15864, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models (Li & Subramani, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.720.pdf
Checklist:: 2026.acl-long.720.checklist.pdf

PDF Cite Search Checklist Fix data