ParsiPy: NLP Toolkit for Historical Persian Texts in Python
Farhan Farsi, Parnian Fazel, Sepand Haghighi, Sadra Sabouri, Farzaneh Goshtasb, Nadia Hajipour, Ehsaneddin Asgari, Hossein Sameti
Abstract
The study of historical languages presents unique challenges due to their complex ortho-graphic systems, fragmentary textual evidence, and the absence of standardized digital repre-sentations of text in those languages. Tack-ling these challenges needs special NLP digi-tal tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy1, an NLP toolkit designed to facili-tate the analysis of historical Persian languages by offering modules for tokenization, lemma-tization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embed-ding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Per-sian) texts, highlighting its potential for ex-panding computational methods in the study of historical languages. Through this work, we contribute to the field of computational philol-ogy, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.- Anthology ID:
- 2025.alp-1.17
- Volume:
- Proceedings of the Second Workshop on Ancient Language Processing
- Month:
- May
- Year:
- 2025
- Address:
- The Albuquerque Convention Center, Laguna
- Editors:
- Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
- Venues:
- ALP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 137–149
- Language:
- URL:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.alp-1.17/
- DOI:
- Cite (ACL):
- Farhan Farsi, Parnian Fazel, Sepand Haghighi, Sadra Sabouri, Farzaneh Goshtasb, Nadia Hajipour, Ehsaneddin Asgari, and Hossein Sameti. 2025. ParsiPy: NLP Toolkit for Historical Persian Texts in Python. In Proceedings of the Second Workshop on Ancient Language Processing, pages 137–149, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
- Cite (Informal):
- ParsiPy: NLP Toolkit for Historical Persian Texts in Python (Farsi et al., ALP 2025)
- PDF:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.alp-1.17.pdf