Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
Chen Zhang, Jiuheng Lin, Xiao Liu, Zekai Zhang, Yansong Feng
Abstract
While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our analysis reveals that rule retrieval constitutes a primary bottleneck in grammar-based translation. Moreover, although LLMs can apply simple rules for translation when explicitly provided, they encounter difficulties in handling more complex rules. To address these challenges, we propose representing grammar rules as code functions, considering their similarities in structure and the benefit of code in facilitating LLM reasoning. Our experiments show that using code rules significantly boosts both rule retrieval and application, ultimately resulting in a 13.1% BLEU improvement in translation.- Anthology ID:
- 2025.acl-long.202
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3977–3997
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.acl-long.202/
- DOI:
- Cite (ACL):
- Chen Zhang, Jiuheng Lin, Xiao Liu, Zekai Zhang, and Yansong Feng. 2025. Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3977–3997, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books (Zhang et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.acl-long.202.pdf