PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities
Jianfei Ma, Zhaoxin Feng, Emmanuele Chersoni, Huacheng Song, Ziqi Zhang
Abstract
Effectively resolving phonological ambiguities is crucial for robust natural language processing, as these ambiguities are pervasive in tasks ranging from speech-to-text, spelling correction, to offensive language detection. However, current Large Language Models (LLMs) frequently struggle to resolve such ambiguities.To address this challenge, we present a framework to enhances LLMs’ phonological capability through a multiple-stage training approach. Our method begins with supervised fine-tuning on well-constructed datasets, including three subtask datasets designed to enhance the model’s foundational phonological knowledge, along with a synthetic dataset of step-by-step reasoning chains. Following this, we apply reinforcement learning to incentivize and stabilize its reasoning.Results show that our framework enables the base model to achieve relatively comparable performance to a much larger model. Our ablation studies reveal that subtask datasets and the synthetic dataset can simultaneously impact as complementary modular enhancers to strengthen LLMs’ integrated application.- Anthology ID:
- 2025.emnlp-main.961
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19018–19033
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.961/
- DOI:
- Cite (ACL):
- Jianfei Ma, Zhaoxin Feng, Emmanuele Chersoni, Huacheng Song, and Ziqi Zhang. 2025. PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19018–19033, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities (Ma et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.961.pdf