Translating Ancient Chinese to Modern Chinese at Scale: A Large Language Model-based Approach
Jiahuan Cao, Dezhi Peng, Yongxin Shi, Zongyuan Jiang, Lianwen Jin
Abstract
Recently, the emergence of large language models (LLMs) has provided powerful foundation models for a wide range of natural language processing (NLP) tasks. However, the vast majority of the pre-training corpus for most existing LLMs is in English, resulting in their Chinese proficiency falling far behind that of English. Furthermore, ancient Chinese has a much larger vocabulary and less available corpus than modern Chinese, which significantly challenges the generalization capacity of existing LLMs. In this paper, we investigate the Ancient-Chinese-to-Modern-Chinese (A2M) translation using LLMs including LLaMA and Ziya. Specifically, to improve the understanding of Chinese texts, we explore the vocabulary expansion and incremental pre-training methods based on existing pre-trained LLMs. Subsequently, a large-scale A2M translation dataset with 4M pairs is utilized to finetune the LLMs.Experimental results demonstrate the effectiveness of the proposed method, especially with Ziya-13B, in translating ancient Chinese to modern Chinese. Moreover,we deeply analyze the performance of various LLMs with different strategies, which we believe can benefit further research on LLM-based A2M approaches.- Anthology ID:
- 2023.alt-1.9
- Volume:
- Proceedings of ALT2023: Ancient Language Translation Workshop
- Month:
- September
- Year:
- 2023
- Address:
- Macau SAR, China
- Venue:
- alt
- SIG:
- Publisher:
- Asia-Pacific Association for Machine Translation
- Note:
- Pages:
- 61–69
- Language:
- URL:
- https://aclanthology.org/2023.alt-1.9
- DOI:
- Cite (ACL):
- Jiahuan Cao, Dezhi Peng, Yongxin Shi, Zongyuan Jiang, and Lianwen Jin. 2023. Translating Ancient Chinese to Modern Chinese at Scale: A Large Language Model-based Approach. In Proceedings of ALT2023: Ancient Language Translation Workshop, pages 61–69, Macau SAR, China. Asia-Pacific Association for Machine Translation.
- Cite (Informal):
- Translating Ancient Chinese to Modern Chinese at Scale: A Large Language Model-based Approach (Cao et al., alt 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.alt-1.9.pdf