Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones

Jianfei Ma, Zhaoxin Feng, Huacheng Song, Emmanuele Chersoni, Zheng Chen


Abstract
Chinese homophones, prevalent in Internet culture, bring rich linguistic twists that are challenging for language models. While native speakers disambiguate them through phonological reasoning and contextual understanding, it remains untested how well LLMs perform on this task and whether LLMs also achieve this via similar reasoning processes or merely through memorization of homophone-original word pairs during training.In this paper, we present HomoP-CN, the first Chinese Internet homophones dataset with systematic perturbations for evaluating LLMs’ homophone restoration capabilities. Using this benchmark, we investigated the influence of semantic, phonological, and graphemic features on LLMs’ restoration accuracy, measured the reliance levels of each model on memorization during restoration through consistency ratios under controlled perturbations, and assessed the effectiveness of various prompting strategies, including contextual cues, pinyin augmentation, few-shot learning, and thought-chain approaches.
Anthology ID:
2025.knowllm-1.11
Volume:
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Yuji Zhang, Canyu Chen, Sha Li, Mor Geva, Chi Han, Xiaozhi Wang, Shangbin Feng, Silin Gao, Isabelle Augenstein, Mohit Bansal, Manling Li, Heng Ji
Venues:
KnowLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–139
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.knowllm-1.11/
DOI:
Bibkey:
Cite (ACL):
Jianfei Ma, Zhaoxin Feng, Huacheng Song, Emmanuele Chersoni, and Zheng Chen. 2025. Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones. In Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), pages 120–139, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones (Ma et al., KnowLLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.knowllm-1.11.pdf