PahGen: Generating Ancient Pahlavi Text via Grammar-guided Zero-shot Translation

Farhan Farsi, Parnian Fazel, Farzaneh Goshtasb, Nadia Hajipour, Sadra Sabouri, Ehsaneddin Asgari, Hossein Sameti


Abstract
The Pahlavi language, aka Middle Persian, is a critical part of Persian cultural and historical heritage which bridges the Old Persian and Modern Persian (Farsi). However, due to its limited digital presence and the scarcity of comprehensive linguistic resources, Pahlavi is at risk of extinction. As an early attempt to preserve this language, this study introduces a framework to translate English text into Pahlavi. Our approach combines grammar-guided term extraction with zero-shot translation, leveraging large language models (LLMs) to generate syntactically and semantically accurate Pahlavi sentences.This framework aims to preserve the Pahlavi language and serves as a model for reviving other endangered languages with similar characteristics. Finally using our framework, we generate a novel dataset of 360 expert-validated parallel English-Pahlavi texts.
Anthology ID:
2025.loresmt-1.16
Volume:
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, U.S.A.
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:
LoResMT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
171–182
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.16/
DOI:
Bibkey:
Cite (ACL):
Farhan Farsi, Parnian Fazel, Farzaneh Goshtasb, Nadia Hajipour, Sadra Sabouri, Ehsaneddin Asgari, and Hossein Sameti. 2025. PahGen: Generating Ancient Pahlavi Text via Grammar-guided Zero-shot Translation. In Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025), pages 171–182, Albuquerque, New Mexico, U.S.A.. Association for Computational Linguistics.
Cite (Informal):
PahGen: Generating Ancient Pahlavi Text via Grammar-guided Zero-shot Translation (Farsi et al., LoResMT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.loresmt-1.16.pdf