Incorporating Lexicon-Aligned Prompting in Large Language Model for Tangut–Chinese Translation

Yuxi Zheng, Jingsong Yu


Abstract
This paper proposes a machine translation approach for Tangut–Chinese using a large language model (LLM) enhanced with lexical knowledge. We fine-tune a Qwen-based LLM using Tangut–Chinese parallel corpora and dictionary definitions. Experimental results demonstrate that incorporating single-character dictionary definitions leads to the best BLEU-4 score of 72.33 for literal translation. Additionally, applying a chain-of-thought prompting strategy significantly boosts free translation performance to 64.20. The model also exhibits strong few-shot learning abilities, with performance improving as the training dataset size increases. Our approach effectively translates both simple and complex Tangut sentences, offering a robust solution for low-resource language translation and contributing to the digital preservation of Tangut texts.
Anthology ID:
2025.alp-1.16
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–136
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.16/
DOI:
Bibkey:
Cite (ACL):
Yuxi Zheng and Jingsong Yu. 2025. Incorporating Lexicon-Aligned Prompting in Large Language Model for Tangut–Chinese Translation. In Proceedings of the Second Workshop on Ancient Language Processing, pages 127–136, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
Incorporating Lexicon-Aligned Prompting in Large Language Model for Tangut–Chinese Translation (Zheng & Yu, ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.16.pdf