The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting

Samarth P, Sanjay Balaji Mahalingam


Abstract
Sandhi, the phonological merging of morphemes, is a central feature of Sanskrit grammar. While Sandhi formation is well-defined by Pāṇini’s Aṣṭādhyāyī, the reverse task—Sandhi splitting—is substantially more complex due to inherent ambiguity and context-sensitive transformations. Accurate splitting is a critical precursor to tokenization in Sanskrit, which lacks explicit word boundaries and presents densely fused compounds. In this work, we present a data-driven approach, fine-tuning the Gemma-3 4B large language model on a dataset of over 49,000 training and 2,000 test examples of compound words and their morpheme-level decompositions. Leveraging the Unsloth framework with low-rank adaptation (LoRA) and 4-bit quantization, we train the model to predict these splits. Our work yields a scalable, Sandhi-aware system designed to enhance modern NLP pipelines for classical Sanskrit, demonstrating an effective application of LLMs to this linguistic challenge.
Anthology ID:
2025.winlp-main.35
Volume:
Proceedings of the 9th Widening NLP Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:
WiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
235–241
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.35/
DOI:
Bibkey:
Cite (ACL):
Samarth P and Sanjay Balaji Mahalingam. 2025. The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting. In Proceedings of the 9th Widening NLP Workshop, pages 235–241, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting (P & Mahalingam, WiNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.35.pdf