The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting

Samarth P; Sanjay Balaji Mahalingam

The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting

Abstract

Sandhi, the phonological merging of morphemes, is a central feature of Sanskrit grammar. While Sandhi formation is well-defined by Pāṇini’s Aṣṭādhyāyī, the reverse task—Sandhi splitting—is substantially more complex due to inherent ambiguity and context-sensitive transformations. Accurate splitting is a critical precursor to tokenization in Sanskrit, which lacks explicit word boundaries and presents densely fused compounds. In this work, we present a data-driven approach, fine-tuning the Gemma-3 4B large language model on a dataset of over 49,000 training and 2,000 test examples of compound words and their morpheme-level decompositions. Leveraging the Unsloth framework with low-rank adaptation (LoRA) and 4-bit quantization, we train the model to predict these splits. Our work yields a scalable, Sandhi-aware system designed to enhance modern NLP pipelines for classical Sanskrit, demonstrating an effective application of LLMs to this linguistic challenge.

Anthology ID:: 2025.winlp-main.35
Volume:: Proceedings of the 9th Widening NLP Workshop
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:: WiNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 235–241
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.35/
DOI:
Bibkey:
Cite (ACL):: Samarth P and Sanjay Balaji Mahalingam. 2025. The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting. In Proceedings of the 9th Widening NLP Workshop, pages 235–241, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Gemma Sutras: Fine-Tuning Gemma 3 for Sanskrit Sandhi Splitting (P & Mahalingam, WiNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.35.pdf

PDF Cite Search Fix data