IndiAnn: A Web-based Annotation Platform for Indic Languages

Bandaru Lavadeep; Ritwik Raghav; Abhik Jana

IndiAnn: A Web-based Annotation Platform for Indic Languages

Bandaru Lavadeep, Ritwik Raghav, Abhik Jana

Abstract

Linguistic annotation tools that work well for non-Indic languages (e.g. English, German, Spanish, etc.) often fail with Indic scripts due to complex Unicode properties, including visual reordering of vowel matras, conjunct characters, and grapheme clusters spanning multiple code points. In this paper, we present a web-based annotation platform IndiAnn, designed for low-resource Indic languages, which uses native browser Unicode rendering, offset-based storage that preserves grapheme clusters, and no forced tokenization in the user interface. The tool supports annotation for tasks such as part-of-speech (POS) tagging, named entity recognition (NER), dependency relation annotation, and semantic role labelling (SRL), that maintain correct character boundaries and enable seamless interoperability with standard NLP pipelines and tools. The framework is designed for Indic languages and has been tested on Telugu, Hindi, Tamil, Malayalam, Bengali, Odia, Marathi, and Kannada, with no script breakage during annotation. To the best of our knowledge, this is the first ever attempt at building a unified annotation framework (IndiAnn), which covers annotation for such varieties of key NLP tasks, having provision for eight Indic languages. The code repository is made publicly available[ <https://github.com/Lavadeep/INDIANN>].

Anthology ID:: 2026.law-main.10
Volume:: Proceedings of the 20th Linguistic Annotation Workshop (LAW XX)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Yang Janet Liu, Luke Gessler
Venues:: LAW | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 130–145
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.law-main.10/
DOI:
Bibkey:
Cite (ACL):: Bandaru Lavadeep, Ritwik Raghav, and Abhik Jana. 2026. IndiAnn: A Web-based Annotation Platform for Indic Languages. In Proceedings of the 20th Linguistic Annotation Workshop (LAW XX), pages 130–145, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: IndiAnn: A Web-based Annotation Platform for Indic Languages (Lavadeep et al., LAW 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.law-main.10.pdf

PDF Cite Search Fix data