Karthick Narayanan R


2025

pdf bib
Field to Model: Pairing Community Data Collection with Scalable NLP through the LiFE Suite
Karthick Narayanan R | Siddharth Singh | Saurabh Singh | Aryan Mathur | Ritesh Kumar | Shyam Ratan | Bornini Lahiri | Benu Pareek | Neerav Mathur | Amalesh Gope | Meiraba Takhellambam | Yogesh Dawer
Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics

We present LiFE Suite as a “Field-to-Model” pipeline, designed to bridge community-centred data collection with scalable language model development. This paper describes the various tools integrated into the LiFE Suite that make this unified pipeline possible. Atekho, a mobile-first data collection platform, is designed to empower communities to assert their rights over their data. MATra-Lab, a web-based data processing and annotation tool, supports the management of field data and the creation of NLP-ready datasets with support from existing state-of-the-art NLP models. LiFE Model Studio, built on top of Hugging Face AutoTrain, offers a no-code solution for building scalable language models using the field data. This end-to-end integration ensures that every dataset collected in the field retains its linguistic, cultural, and metadata context, all the way through to deployable AI models and archive-ready datasets.