ParsiPy: NLP Toolkit for Historical Persian Texts in Python

Farhan Farsi, Parnian Fazel, Sepand Haghighi, Sadra Sabouri, Farzaneh Goshtasb, Nadia Hajipour, Ehsaneddin Asgari, Hossein Sameti


Abstract
The study of historical languages presents unique challenges due to their complex ortho-graphic systems, fragmentary textual evidence, and the absence of standardized digital repre-sentations of text in those languages. Tack-ling these challenges needs special NLP digi-tal tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy1, an NLP toolkit designed to facili-tate the analysis of historical Persian languages by offering modules for tokenization, lemma-tization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embed-ding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Per-sian) texts, highlighting its potential for ex-panding computational methods in the study of historical languages. Through this work, we contribute to the field of computational philol-ogy, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.
Anthology ID:
2025.alp-1.17
Volume:
Proceedings of the Second Workshop on Ancient Language Processing
Month:
May
Year:
2025
Address:
The Albuquerque Convention Center, Laguna
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:
ALP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
137–149
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.alp-1.17/
DOI:
10.18653/v1/2025.alp-1.17
Bibkey:
Cite (ACL):
Farhan Farsi, Parnian Fazel, Sepand Haghighi, Sadra Sabouri, Farzaneh Goshtasb, Nadia Hajipour, Ehsaneddin Asgari, and Hossein Sameti. 2025. ParsiPy: NLP Toolkit for Historical Persian Texts in Python. In Proceedings of the Second Workshop on Ancient Language Processing, pages 137–149, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):
ParsiPy: NLP Toolkit for Historical Persian Texts in Python (Farsi et al., ALP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.alp-1.17.pdf