Rajani Chulyadyo

2026

Nepal Script Text Recognition from Ancient Artifacts: Challenges and Opportunities
Swornim Nakarmi | Sarin Sthapit | Sahil Ratna Tuladhar | Arya Shakya | Bal Krishna Bal | Rajani Chulyadyo
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Nepal Script, a script of significant linguistic, historical, and cultural importance, can be found in ancient artifacts in Nepal. As this script has faced a decline in use, it is considered among endangered scripts at present. For its revival and preservation, it is important to digitize ancient artifacts written in Nepal Script and create an accessible digital dataset. Among such artifacts are stone inscriptions, and manuscripts, from which we attempt to recognize texts using Artificial Intelligence techniques. This paper presents our approach of preparing a dataset through an extensive data acquisition method, and developing a system that recognizes Nepal Script texts from images. Our system combines the YOLOv8 algorithm with Convolutional Recurrent Neural Network architecture and Connectionist Temporal Classification loss. Our dataset consists of 5,219 text line images from ancient stone inscriptions, manuscripts, and modern handwritten and typed documents. Utilizing an augmented dataset of 41,752 samples, our system achieved 12.61% Character Error Rate. Despite the small training dataset, our model successfully predicted texts in not only new stone inscriptions and manuscripts but also wooden and copper plate inscriptions. We expect our contributions will encourage further research on Nepal Script and other Nepalese scripts.

2024

pdf bib abs

Nepal Script Text Recognition Using CRNN CTC Architecture
Swornim Nakarmi | Sarin Sthapit | Arya Shakya | Rajani Chulyadyo | Bal Krishna Bal
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

Nepal Script (also known as Prachalit Script) is the widely used script of Nepal Bhasa, the native language of the Kathmandu Valley in Nepal. Derived from the Brahmi Script, the Nepal Script was developed in the 9th century and was extensively used till the 20th century, before being replaced by the Devanagari script. Numerous ancient manuscripts, inscriptions, and documents written in the Nepal Script are still available containing immense knowledge on architecture, arts, astrology, ayurveda, literature, music, tantrism, etc. To preserve and revive Nepal Bhasa, digitizing such documents plays a crucial role. This paper presents our work on text recognition for the Nepal Script. The implementation includes the Nepal Script text recognizer based on CRNN CTC architecture aided by line and word segmentations. Leveraging a carefully curated dataset that encompasses handwritten and printed texts in the Nepal Script, our work has achieved CER of 6.65% and WER of 13.11%. The dataset used for this work is available as Nepal Script Text Dataset on Kaggle. The paper further explores the associated challenges due to the complex nature of the script such as conjuncts, modifiers and variations; and the current state of the script.

Co-authors

Venues

Fix author