Linguistic Profiling of Transformer Embedding Geometry

Lucia Domenichelli, Dominique Brunato, Felice Dell’Orletta


Abstract
Transformer language models embed tokens in high-dimensional spaces, but whether geometry reflects linguistic structure remains unclear. We analyse token representations in BERT and GPT\mbox{-}2, selected as canonical encoder-only and decoder-only Transformer architectures, through a linguistically grounded geometric lens. We partition tokens from the UD English-EWT treebank by surface and syntactic features (position, length, POS, head distance and arity) and examine how their representational geometry evolves across layers. We employ complementary diagnostic metrics, including isotropy, linear and nonlinear intrinsic dimensionality, to capture distinct aspects of embedding structure. Our findings reveal that BERT maintains more isotropic and higher-dimensional subspaces, whereas GPT\mbox{-}2 exhibits stronger anisotropy driven by a compact cluster of sentence-initial tokens. Across models, open-class words, longer tokens, and high-arity predicates occupy more isotropic, higher-dimensional manifolds than short function words and pre-head modifiers, indicating that semantic richness and syntactic centrality play a key role in structuring embedding space. Our analysis provides a reusable framework for profiling how linguistic abstractions organize the geometry of Transformer embeddings.
Anthology ID:
2026.conll-main.10
Volume:
Proceedings of the 30th Conference on Computational Natural Language Learning
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Claire Bonial, Yevgeni Berzak
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
145–164
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.10/
DOI:
Bibkey:
Cite (ACL):
Lucia Domenichelli, Dominique Brunato, and Felice Dell’Orletta. 2026. Linguistic Profiling of Transformer Embedding Geometry. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 145–164, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Linguistic Profiling of Transformer Embedding Geometry (Domenichelli et al., CoNLL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.10.pdf