Sepand Haghighi
2025
ParsiPy: NLP Toolkit for Historical Persian Texts in Python
Farhan Farsi
|
Parnian Fazel
|
Sepand Haghighi
|
Sadra Sabouri
|
Farzaneh Goshtasb
|
Nadia Hajipour
|
Ehsaneddin Asgari
|
Hossein Sameti
Proceedings of the Second Workshop on Ancient Language Processing
The study of historical languages presents unique challenges due to their complex ortho-graphic systems, fragmentary textual evidence, and the absence of standardized digital repre-sentations of text in those languages. Tack-ling these challenges needs special NLP digi-tal tools to handle phonetic transcriptions and analyze ancient texts. This work introduces ParsiPy1, an NLP toolkit designed to facili-tate the analysis of historical Persian languages by offering modules for tokenization, lemma-tization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embed-ding. We demonstrate the utility of our toolkit through the processing of Parsig (Middle Per-sian) texts, highlighting its potential for ex-panding computational methods in the study of historical languages. Through this work, we contribute to the field of computational philol-ogy, offering tools that can be adapted for the broader study of ancient texts and their digital preservation.
Search
Fix data
Co-authors
- Ehsaneddin Asgari 1
- Farhan Farsi 1
- Parnian Fazel 1
- Farzaneh Goshtasb 1
- Nadia Hajipour 1
- show all...