Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models
Matthew King-Hang Ma, Xie Chenwei, Wenbo Wang, William Shiyuan Wang
Abstract
Homonymy can easily raise lexical ambiguity due to the misunderstanding of its multiple senses. Correct recognition of homonym sense greatly relies on its surrounding context. This ambiguous nature makes homonyms an appropriate testbed for examining the contextualization capability of pre-trained (PLM) and large language models (LLMs). Considering the impact of part of speech (POS) on homonym disambiguation and the prevalence of English-focused studies in word embedding research, this study extends to Chinese and provides a comprehensive layer-wise analysis of homonym representations in both languages, spanning same and different POS categories, across four families of PLMs/LLMs (BERT, GPT-2, Llama 3, Qwen 2.5). Through the creation of a synthetic dataset and computation of disambiguation score (D-Score), we found that: (1) no universal layer depth excels in differentiating homonym representations; (2) bidirectional models produce better contextualized homonym representations compared to much larger autoregressive models; (3) most importantly, POS affects homonym representations in models in ways that differ from human research findings. The individual differences between LLMs uncovered in our study challenge the simplistic understanding of their inner workings. This reveals a compelling research frontier: conducting controlled experiments with purposefully manipulated inputs to enhance the interpretability of LLMs. We have made our dataset and codes available publicly at https://github.com/neurothew/exploring-homonym-rep-in-llm.- Anthology ID:
- 2025.findings-acl.1011
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venues:
- Findings | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19705–19724
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1011/
- DOI:
- Cite (ACL):
- Matthew King-Hang Ma, Xie Chenwei, Wenbo Wang, and William Shiyuan Wang. 2025. Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19705–19724, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models (Ma et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1011.pdf