AYN: A Tiny Yet Competitive Indian Legal Language Model Pretrained from Scratch

Mitodru Niyogi, Eric Gaussier, Arnab Bhattacharya


Abstract
Decoder-only Large Language Models (LLMs) are currently the model of choice for many Natural Language Processing (NLP) applications. Through instruction fine-tuning and prompting approaches, such LLMs have been efficiently used to solve both general and domain-specific tasks. However, they are costly to train and, to a certain extent, costly to use as well, and one can wonder whether LLMs can be replaced by domain-specific Tiny Language Models (TLMs), which typically contain less than 100M parameters. We address this question in this study by comparing the performance of an 88M TLM pretrained from scratch for 185 A100 hours on a specific domain with a domain-specific tokenizer (here, the Indian legal domain) with LLMs of various sizes between 1B and 8B for solving domain-specific tasks. We show in particular that our legal TLM, Ayn, can indeed outperform LLMs up to 80 times larger on the legal case judgment prediction task, rival LLMs up to 30 times larger on the summarization task, and still be competitive with these larger LLMs on general tasks.
Anthology ID:
2026.lrec-main.839
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
10699–10722
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.839/
DOI:
Bibkey:
Cite (ACL):
Mitodru Niyogi, Eric Gaussier, and Arnab Bhattacharya. 2026. AYN: A Tiny Yet Competitive Indian Legal Language Model Pretrained from Scratch. International Conference on Language Resources and Evaluation, main:10699–10722.
Cite (Informal):
AYN: A Tiny Yet Competitive Indian Legal Language Model Pretrained from Scratch (Niyogi et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.839.pdf