Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones
Jianfei Ma, Zhaoxin Feng, Huacheng Song, Emmanuele Chersoni, Zheng Chen
Abstract
Chinese homophones, prevalent in Internet culture, bring rich linguistic twists that are challenging for language models. While native speakers disambiguate them through phonological reasoning and contextual understanding, it remains untested how well LLMs perform on this task and whether LLMs also achieve this via similar reasoning processes or merely through memorization of homophone-original word pairs during training.In this paper, we present HomoP-CN, the first Chinese Internet homophones dataset with systematic perturbations for evaluating LLMs’ homophone restoration capabilities. Using this benchmark, we investigated the influence of semantic, phonological, and graphemic features on LLMs’ restoration accuracy, measured the reliance levels of each model on memorization during restoration through consistency ratios under controlled perturbations, and assessed the effectiveness of various prompting strategies, including contextual cues, pinyin augmentation, few-shot learning, and thought-chain approaches.- Anthology ID:
- 2025.knowllm-1.11
- Volume:
- Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Yuji Zhang, Canyu Chen, Sha Li, Mor Geva, Chi Han, Xiaozhi Wang, Shangbin Feng, Silin Gao, Isabelle Augenstein, Mohit Bansal, Manling Li, Heng Ji
- Venues:
- KnowLLM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 120–139
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.knowllm-1.11/
- DOI:
- Cite (ACL):
- Jianfei Ma, Zhaoxin Feng, Huacheng Song, Emmanuele Chersoni, and Zheng Chen. 2025. Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones. In Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), pages 120–139, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones (Ma et al., KnowLLM 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.knowllm-1.11.pdf