Abstract
This paper addresses the task of Chinese Lexical Simplification (CLS). A key challenge in CLS is the scarcity of data resources. We begin by evaluating the performance of various language models at different scales in unsupervised and few-shot settings, finding that their effectiveness is sensitive to word types. Expensive large language models (LLMs), such as GPT-4, outperform small models in simplifying complex content words and Chinese idioms from the dictionary.To take advantage of this, we propose an automatic knowledge distillation framework called PivotKD for generating training data to fine-tune small models.In addition, all models face difficulties with out-of-dictionary (OOD) words such as internet slang.To address this, we implement a retrieval-based interpretation augmentation (RIA) strategy, injecting word interpretations from external resources into the context.Experimental results demonstrate that fine-tuned small models outperform GPT-4 in simplifying complex content words and Chinese idioms. Additionally, the RIA strategy enhances the performance of most models, particularly in handling OOD words. Our findings suggest that a hybrid approach could optimize CLS performance while managing inference costs. This would involve configuring choices such as model scale, linguistic resources, and the use of RIA based on specific word types to strike an ideal balance.- Anthology ID:
- 2024.emnlp-main.849
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15227–15239
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.849/
- DOI:
- 10.18653/v1/2024.emnlp-main.849
- Cite (ACL):
- ZiHao Xiao, Jiefu Gong, Shijin Wang, and Wei Song. 2024. Optimizing Chinese Lexical Simplification Across Word Types: A Hybrid Approach. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15227–15239, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Optimizing Chinese Lexical Simplification Across Word Types: A Hybrid Approach (Xiao et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.849.pdf