Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models

Matthew King-Hang Ma, Xie Chenwei, Wenbo Wang, William Shiyuan Wang


Abstract
Homonymy can easily raise lexical ambiguity due to the misunderstanding of its multiple senses. Correct recognition of homonym sense greatly relies on its surrounding context. This ambiguous nature makes homonyms an appropriate testbed for examining the contextualization capability of pre-trained (PLM) and large language models (LLMs). Considering the impact of part of speech (POS) on homonym disambiguation and the prevalence of English-focused studies in word embedding research, this study extends to Chinese and provides a comprehensive layer-wise analysis of homonym representations in both languages, spanning same and different POS categories, across four families of PLMs/LLMs (BERT, GPT-2, Llama 3, Qwen 2.5). Through the creation of a synthetic dataset and computation of disambiguation score (D-Score), we found that: (1) no universal layer depth excels in differentiating homonym representations; (2) bidirectional models produce better contextualized homonym representations compared to much larger autoregressive models; (3) most importantly, POS affects homonym representations in models in ways that differ from human research findings. The individual differences between LLMs uncovered in our study challenge the simplistic understanding of their inner workings. This reveals a compelling research frontier: conducting controlled experiments with purposefully manipulated inputs to enhance the interpretability of LLMs. We have made our dataset and codes available publicly at https://github.com/neurothew/exploring-homonym-rep-in-llm.
Anthology ID:
2025.findings-acl.1011
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19705–19724
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1011/
DOI:
Bibkey:
Cite (ACL):
Matthew King-Hang Ma, Xie Chenwei, Wenbo Wang, and William Shiyuan Wang. 2025. Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19705–19724, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models (Ma et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1011.pdf